Methods and compositions for evaluating breast cancer prognosis

ABSTRACT

Methods and compositions for evaluating the prognosis of a breast cancer patient, particularly an early-stage breast cancer patient, are provided. The methods of the invention comprise detecting expression of at least one, more particularly at least two, biomarker(s) in a body sample, wherein overexpression of the biomarker or a combination of biomarkers is indicative of breast cancer prognosis. In some embodiments, the body sample is a breast tissue sample, particularly a primary breast tumor sample. The biomarkers of the invention are proteins and/or genes whose overexpression is indicative of either a good or bad cancer prognosis. Biomarkers of interest include proteins and genes involved in cell cycle regulation, DNA replication, transcription, signal transduction, cell proliferation, invasion, proteolysis, or metastasis. In some aspects of the invention, overexpression of a biomarker of interest is detected at the protein level using biomarker-specific antibodies or at the nucleic acid level using nucleic acid hybridization techniques.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a divisional application of U.S. Utility application Ser. No. 11/233,510, filed Sep. 22, 2005, which claims the benefit of U.S. Provisional Application Ser. No. 60/612,073, filed Sep. 22, 2004, and U.S. Provisional Application Ser. No. 60/611,965, filed Sep. 22, 2004, all of which are incorporated herein by reference in their entirety.

REFERENCE TO A SEQUENCE LISTING SUBMITTED AS A TEXT FILE VIA EFS-WEB

The official copy of the sequence listing is submitted concurrently with the specification as a text file via EFS-Web, in compliance with the American Standard Code for Information Interchange (ASCII), with a file name of 372587SequenceListing.txt, a creation date of May 14, 2009, and a size of 161 KB. The sequence listing filed via EFS-Web is part of the specification and is hereby incorporated in its entirety by reference herein.

FIELD OF THE INVENTION

The present invention relates to methods and compositions for evaluating the prognosis of a patient afflicted with breast cancer, particularly early-stage breast cancer.

BACKGROUND OF THE INVENTION

Breast cancer is the second most common cancer among American women, less frequent only than skin cancer. An American woman has a one in eight chance of developing breast cancer during her lifetime, and the American Cancer Society estimates that more than 250,000 new cases of breast cancer will be reported in the U.S. this year. Breast cancer is the second leading cause of cancer deaths in women, with more than 40,000 Americans expected to die from the disease in 2004.

Improved detection methods, mass screening, and advances in treatment over the last decade have significantly improved the outlook for woman diagnosed with breast cancer. Today, approximately 80% of breast cancer cases are diagnosed in the early stages of the disease when survival rates are at their highest. As a result, about 85% percent of breast cancer patients are alive at least 5 years after diagnosis.

Despite these advances, approximately 20% of women diagnosed with early-stage breast cancer have a poor ten-year outcome and will suffer disease recurrence, metastasis, or death within this time period. The remaining 80% of breast cancer patients diagnosed at an early stage, however, have a good 10-year prognosis and are unlikely to need, or benefit from, additional aggressive adjuvant therapy (e.g., chemotherapy). The current clinical consensus is that at least some early-stage, node-negative breast cancer patients should receive adjuvant chemotherapy, but presently there are no widely used assays to risk stratify patients for more aggressive treatment. Since the majority of these early-stage cancer patients enjoy long-term survival following surgery and/or radiation therapy without further treatment, it is likely inappropriate to recommend aggressive adjuvant therapy for all of these patients, particularly in light of the significant side effects associated with cancer chemotherapeutics. Compositions and methods that permit the differentiation of these populations of early-stage breast cancer patients at the time of initial diagnosis into good and bad prognosis groups would assist clinicians in selecting appropriate courses of treatment. Thus, methods for evaluating the prognosis of breast cancer patients, particularly early-stage breast cancer patients, are needed.

Significant research has focused on identifying methods and factors for assessing breast cancer prognosis and predicting therapeutic response. (See generally, Ross and Hortobagyi, eds. (in press) Molecular Oncology of Breast Cancer (Jones and Bartlett Publishers, Boston, Mass.) and the references cited therein, all of which are herein incorporated by reference in their entirety). Prognostic indicators include more conventional factors, such as tumor size, nodal status, and histological grade, as well as molecular markers that provide some information regarding prognosis and likely response to particular treatments. For example, determination of estrogen (ER) and progesterone (PR) steroid hormone receptor status has become a routine procedure in assessment of breast cancer patients. See, for example, Fitzgibbons et al. (2000) Arch. Pathol. Lab. Med. 124:966-978. Tumors that are hormone receptor positive are more likely to respond to hormone therapy and also typically grow less aggressively, thereby resulting in a better prognosis for patients with ER+/PR+tumors.

Overexpression of human epidermal growth factor receptor 2 (HER-2/neu), a transmembrane tyrosine kinase receptor protein, has been correlated with poor breast cancer prognosis. Ross et al. (2003) The Oncologist:307-325. Her2/neu expression levels in breast tumors are currently used to predict response to the anti-Her-2/neu antibody therapeutic trastuzumab (Herceptin®; Genentech). See, for example, Id. and Ross et al., supra. Furthermore, approximately one-third of breast cancers have mutations in the tumor suppressor gene p53, and these mutations have been associated with increased disease aggressiveness and poor prognostic outcome. Fitzgibbons et al., supra. Ki-67 is a non-histone nuclear protein that is expressed during the G1 through M phases of the cell cycle. Studies have shown that overexpression of the cellular proliferation marker Ki-67 also correlates with poor breast cancer prognosis. Id.

Although current prognostic criteria and molecular markers provide some guidance in predicting patient outcome and selecting appropriate course of treatment, a significant need exists for a specific and sensitive method for evaluating breast cancer prognosis, particularly in early-stage, lymph-node negative patients. Such a method should specifically distinguish breast cancer patients with a poor prognosis from those with a good prognosis and permit the identification of high-risk, early-stage breast cancer patients who are likely to need aggressive adjuvant therapy.

SUMMARY OF THE INVENTION

Methods and compositions for evaluating the prognosis of a cancer patient, particularly a breast cancer patient, are provided. The methods comprise detecting expression of at least one, more particularly at least two, biomarker(s) in a body sample, wherein the overexpression of a biomarker or combination of biomarkers is indicative of cancer prognosis. Overexpression of the biomarker or combination of biomarkers of the invention is indicative of either a good prognosis (i.e., disease-free survival) or a bad prognosis (i.e., cancer recurrence, metastasis, or death from the underlying cancer). Thus, the present method permits the differentiation of breast cancer patients with a good prognosis from those patients with a bad prognosis. The methods disclosed herein can be used in combination with assessment of conventional clinical factors (e.g., tumor size, tumor grade, lymph node status, and family history) and/or analysis of the expression level of molecular markers, such as Her2/neu, Ki67, p53, and estrogen and progesterone hormone receptors. In this manner, the methods of the invention permit a more accurate evaluation of breast cancer prognosis.

The biomarkers of the invention are proteins and/or genes whose overexpression is indicative of cancer prognosis, including those biomarkers involved in cell cycle regulation, DNA replication, transcription, signal transduction, cell proliferation, invasion, or metastasis. The detection of overexpression of the biomarker genes or proteins of the invention permits the evaluation of cancer prognosis and facilitates the separation of breast cancer patients into good and bad prognosis risk groups for the purposes of, for example, treatment selection.

Biomarker expression can be assessed at the protein or nucleic acid level. In some embodiments, immunohistochemistry techniques are provided that utilize antibodies to detect the expression of biomarker proteins in breast tumor samples. In this aspect of the invention, at least one antibody directed to a specific biomarker of interest is used. Expression can also be detected by nucleic acid-based techniques, including, for example, hybridization and RT-PCR.

Compositions include monoclonal antibodies capable of binding to biomarker proteins of the invention. Antigen-binding fragments and variants of these monoclonal antibodies, hybridoma cell lines producing these antibodies, and isolated nucleic acid molecules encoding the amino acid sequences of these monoclonal antibodies are also encompassed herein. Kits comprising reagents for practicing the methods of the invention are further provided.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the distribution of percentage of cells staining with an intensity of 2 as a function of actual breast cancer outcome. Experimental details are provided in Example 4.

FIG. 2 provides the ROC curve obtained using the sequence-based interpretation approach for the SLPI/p21ras/E2F1/PSMB9/src/phospho-p27 combination. Experimental details are provided in Example 5.

FIG. 3 provides the Kaplan-Meier plot for the prognostic performance of the SLPI, src, PSMB9, p21ras, and E2F1 biomarker panel. Details are provided in Example 8.

FIG. 4 provides a graphical representation of the long-term survival data for the general breast cancer patient population, independent of analysis of biomarker overexpression. Details are provided in Example 8.

FIG. 5 provides the Kaplan-Meier plot for the prognostic performance of the SLPI, src, PSMB9, p21ras, E2F1, and MUC-1 biomarker panel. Details are provided in Example 9.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides methods and compositions for evaluating the prognosis of a cancer patient, particularly a breast cancer patient, more particularly an early-stage breast cancer patient. The methods comprise detecting the expression of biomarkers in a patient tissue or body fluid sample and determining if said biomarkers are overexpressed. Overexpression of a biomarker or combination of biomarkers used in the practice of the invention is indicative of breast cancer prognosis (i.e., bad or good prognosis). Thus, overexpression of a particular biomarker or combination of biomarkers of interest permits the differentiation of breast cancer patients that are likely to experience disease recurrence (i.e., poor prognosis) from those who are more likely to remain cancer-free (i.e., good prognosis). In some aspects of the invention, the methods involve detecting the overexpression of at least one biomarker in a breast tumor sample that is indicative of a poor breast cancer prognosis and thereby identifying patients who are more likely to suffer a recurrence of the underlying cancer. The methods of the invention can also be used to assist in selecting appropriate courses of treatment and to identify patients that would benefit from more aggressive therapy. In particular embodiments, antibodies and immunohistochemistry techniques are used to detect expression of a biomarker of interest and to evaluate the prognosis of a breast cancer patient. Monoclonal antibodies specific for biomarkers of interest and kits for practicing the methods of the invention are further provided.

By “breast cancer” is intended, for example, those conditions classified by biopsy as malignant pathology. The clinical delineation of breast cancer diagnoses is well-known in the medical arts. One of skill in the art will appreciate that breast cancer refers to any malignancy of the breast tissue, including, for example, carcinomas and sarcomas. In particular embodiments, the breast cancer is ductal carcinoma in situ (DCIS), lobular carcinoma in situ (LCIS), or mucinous carcinoma. Breast cancer also refers to infiltrating ductal (IDC) or infiltrating lobular carcinoma (ILC). In most embodiments of the invention, the subject of interest is a human patient suspected of or actually diagnosed with breast cancer.

The American Joint Committee on Cancer (AJCC) has developed a standardized system for breast cancer staging using a “TNM” classification scheme. Patients are assessed for primary tumor size (T), regional lymph node status (N), and the presence/absence of distant metastasis (M) and then classified into stages 0-IV based on this combination of factors. In this system, primary tumor size is categorized on a scale of 0-4 (T0=no evidence of primary tumor; T1=≦2 cm; T2=>2 cm-≦5 cm; T3=>5 cm; T4=tumor of any size with direct spread to chest wall or skin). Lymph node status is classified as N0-N3 (N0=regional lymph nodes are free of metastasis; N1=metastasis to movable, same-side axillary lymph node(s); N2=metastasis to same-side lymph node(s) fixed to one another or to other structures; N3=metastasis to same-side lymph nodes beneath the breastbone). Metastasis is categorized by the absence (M0) or presence of distant metastases (M1). While breast cancer patients at any clinical stage are encompassed by the present invention, breast cancer patients in early-stage breast cancer are of particular interest. By “early-stage breast cancer” is intended stages 0 (in situ breast cancer), I (T1, N0, M0), IIA (T0-1, N1, M0 or T2, N0, M0), and IIB (T2, N1, M0 or T3, N0, M0). Early-stage breast cancer patients exhibit little or no lymph node involvement. As used herein, “lymph node involvement” or “lymph node status” refers to whether the cancer has metastasized to the lymph nodes. Breast cancer patients are classified as “lymph node-positive” or “lymph node-negative” on this basis. Methods of identifying breast cancer patients and staging the disease are well known and may include manual examination, biopsy, review of patient's and/or family history, and imaging techniques, such as mammography, magnetic resonance imaging (MRI), and positron emission tomography (PET).

The term “prognosis” is recognized in the art and encompasses predictions about the likely course of disease or disease progression, particularly with respect to likelihood of disease remission, disease relapse, tumor recurrence, metastasis, and death. “Good prognosis” refers to the likelihood that a patient afflicted with cancer, particularly breast cancer, will remain disease-free (i.e., cancer-free). “Poor prognosis” is intended to mean the likelihood of a relapse or recurrence of the underlying cancer or tumor, metastasis, or death. Cancer patients classified as having a “good outcome” remain free of the underlying cancer or tumor. In contrast, “bad outcome” cancer patients experience disease relapse, tumor recurrence, metastasis, or death. In particular embodiments, the time frame for assessing prognosis and outcome is, for example, less than one year, one, two, three, four, five, six, seven, eight, nine, ten, fifteen, twenty or more years. As used herein, the relevant time for assessing prognosis or disease-free survival time begins with the surgical removal of the tumor or suppression, mitigation, or inhibition of tumor growth. Thus, for example, in particular embodiments, a “good prognosis” refers to the likelihood that a breast cancer patient will remain free of the underlying cancer or tumor for a period of at least five, more particularly, a period of at least ten years. In further aspects of the invention, a “bad prognosis” refers to the likelihood that a breast cancer patient will experience disease relapse, tumor recurrence, metastasis, or death within less than five years, more particularly less than ten years. Time frames for assessing prognosis and outcome provided above are illustrative and are not intended to be limiting.

In some embodiments described herein, prognostic performance of the biomarkers and/or other clinical parameters was assessed utilizing a Cox Proportional Hazards Model Analysis, which is a regression method for survival data that provides an estimate of the hazard ratio and its confidence interval. The Cox model is a well-recognized statistical technique for exploring the relationship between the survival of a patient and particular variables. This statistical method permits estimation of the hazard (i.e., risk) of individuals given their prognostic variables (e.g., overexpression of particular biomarkers, as described herein). Cox model data are commonly presented as Kaplan-Meier curves. The “hazard ratio” is the risk of death at any given time point for patients displaying particular prognostic variables. See generally Spruance et al. (2004) Antimicrob. Agents & Chemo. 48:2787-2792. In particular embodiments, the biomarkers of interest are statistically significant for assessment of the likelihood of breast cancer recurrence or death due to the underlying breast cancer. Methods for assessing statistical significance are well known in the art and include, for example, using a log-rank test Cox analysis and Kaplan-Meier curves. In some aspects of the invention, a p-value of less than 0.05 constitutes statistical significance.

As described herein above, a number of clinical and prognostic breast cancer factors are known in the art and are used to predict treatment outcome and the likelihood of disease recurrence. Such factors include lymph node involvement, tumor size, histologic grade, family history, estrogen and progesterone hormone receptor status, Her 2/neu levels, and tumor ploidy. As used herein, estrogen and progesterone hormone receptor status refers to whether these receptors are expressed in the breast tumor of a particular breast cancer patient. Thus, an “estrogen receptor-positive patient” displays estrogen receptor expression in a breast tumor, whereas an “estrogen receptor-negative patient” does not. Using the methods of the present invention, the prognosis of a breast cancer patient can be determined independent of or in combination with assessment of these or other clinical and prognostic factors. In some embodiments, combining the methods disclosed herein with evaluation of other prognostic factors may permit a more accurate determination of breast cancer prognosis. The methods of the invention may be coupled with analysis of, for example, Her2/neu, Ki67, and/or p53 expression levels. Other factors, such as patient clinical history, family history, and menopausal status, may also be considered when evaluating breast cancer prognosis via the methods of the invention. In some embodiments, patient data obtained via the methods disclosed herein may be coupled with analysis of clinical information and existing tests for breast cancer prognosis to develop a reference laboratory prognostic algorithm. Such algorithms find used in stratifying breast cancer patients, particularly early-stage breast cancer patients, into good and bad prognosis populations. Patients assessed as having a poor prognosis may be upstaged for more aggressive breast cancer treatment.

The methods of the invention permit the superior assessment of breast cancer prognosis in comparison to analysis of other known prognostic indicators (e.g., lymph node involvement, tumor size, histologic grade, estrogen and progesterone receptor levels, Her 2/neu status, tumor ploidy, and family history). In particular aspects of the invention, the sensitivity and specificity is equal to or greater than that of known cancer prognostic evaluation methods. The endpoint for assessing specificity and sensitivity is comparison of the prognosis or outcome predicted using the methods of the invention (i.e., at or near the time of diagnosis) with the actual clinical outcome (i.e., whether the patient remained cancer-free or suffered a recurrence within a specified time period). As used herein, “specificity” refers to the level at which a method of the invention can accurately identify true negatives. In a clinical study, specificity is calculated by dividing the number of true negatives by the sum of true negatives and false positives. By “sensitivity” is intended the level at which a method of the invention can accurately identify samples that are true positives. Sensitivity is calculated in a clinical study by dividing the number of true positives by the sum of true positives and false negatives. In some embodiments, the sensitivity of the disclosed methods for the evaluation of breast cancer is at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more. Furthermore, the specificity of the present methods is preferably at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more. In further embodiments, the combined sensitivity and specificity value for the prognostic methods of the invention is assessed. By “combined sensitivity and specificity value” is intended the sum of the individual specificity and sensitivity values, as defined herein above. The combined sensitivity and specificity value of the present methods is preferably at least about 105%, 110%, 115%, 120%, 130%, 140%, 150%, 160% or more.

As used herein, the definitions of “true” and “false” positives and negatives will be dependent upon whether the biomarker or combination of biomarkers under consideration are good outcome or bad outcome biomarkers. That is, in the case of good outcome biomarkers (i.e., those indicative of a good prognosis), “true positive” refers to those samples exhibiting overexpression of the biomarker of interest, as determined by the methods of the invention (e.g., positive staining by immunohistochemistry), that have a confirmed good actual clinical outcome. In contrast, “false positives” display overexpression of the good outcome biomarker(s) but have a confirmed bad actual clinical outcome. “True negatives” and “false negatives” with respect to good outcome biomarkers do not display biomarker overexpression (e.g., do not stain positive in immunohistochemistry methods) and have confirmed bad and good actual clinical outcomes, respectively.

Similarly, in the case of bad outcome biomarkers, “true positives” refers to those samples exhibiting overexpression of the biomarker or combination of biomarkers of interest that have a confirmed bad actual clinical outcome. That is, “true positive” with respect to both good and bad outcome biomarkers refers to samples in which the actual clinical outcome (i.e., good or bad) is accurately predicted. “False positives” display overexpression of the bad outcome biomarker but have a confirmed good actual clinical outcome. “True negatives” and “false negatives” with respect to bad outcome biomarkers do not display biomarker overexpression and have confirmed good and bad actual clinical outcomes, respectively.

Breast cancer is managed by several alternative strategies that may include, for example, surgery, radiation therapy, hormone therapy, chemotherapy, or some combination thereof. As is known in the art, treatment decisions for individual breast cancer patients can be based on the number of lymph nodes involved, estrogen and progesterone receptor status, size of the primary tumor, and stage of the disease at diagnosis. Analysis of a variety of clinical factors and clinical trials has led to the development of recommendations and treatment guidelines for early-stage breast cancer by the International Consensus Panel of the St. Gallen Conference (2001). See Goldhirsch et al. (2001) J. Clin. Oncol. 19:3817-3827, which is herein incorporated by reference in its entirety. The guidelines indicate that treatment for patients with node-negative breast cancer varies substantially according to the baseline prognosis. More aggressive treatment is recommended for patients with a relative high risk of recurrence when compared to patients with a relatively low risk of recurrence. It has been demonstrated that chemotherapy for the high risk population has resulted in a reduction in the risk of relapse. Women with a low risk category are usually treated with radiation and hormonal therapy. Stratification of patients into poor prognosis or good prognosis risk groups at the time of diagnosis using the methods disclosed herein may provide an additional or alternative treatment decision-making factor. The methods of the invention permit the differentiation of breast cancer patients with a good prognosis from those more likely to suffer a recurrence (i.e., patients who might need or benefit from additional aggressive treatment at the time of diagnosis). The methods of the invention find particular use in choosing appropriate treatment for early-stage breast cancer patients. As discussed above, the majority of breast cancer patients diagnosed at an early-stage of the disease enjoy long-term survival following surgery and/or radiation therapy without further adjuvant therapy. A significant percentage (approximately 20%) of these patients, however, will suffer disease recurrence or death, leading to clinical recommendations that some or all early-stage breast cancer patients should receive adjuvant therapy (e.g., chemotherapy). The methods of the present invention find use in identifying this high-risk, poor prognosis population of early-stage breast cancer patients and thereby determining which patients would benefit from continued and/or more aggressive therapy and close monitoring following treatment. For example, early-stage breast cancer patients assessed as having a poor prognosis by the methods disclosed herein may be selected for more aggressive adjuvant therapy, such as chemotherapy, following surgery and/or radiation treatment. In particular embodiments, the methods of the present invention may be used in conjunction with the treatment guidelines established by the St. Gallens Conference to permit physicians to make more informed breast cancer treatment decisions. The present methods for evaluating breast cancer prognosis can also be combined with other prognostic methods and molecular marker analyses known in the art (e.g., Her2/neu, Ki67, and p53 expression levels) for purposes of selecting an appropriate breast cancer treatment. Furthermore, the methods of the invention can be combined with later-developed prognostic methods and molecular marker analyses not currently known in the art.

The methods disclosed herein also find use in predicting the response of a breast cancer patient to a selected treatment. By “predicting the response of a breast cancer patient to a selected treatment” is intended assessing the likelihood that a patient will experience a positive or negative outcome with a particular treatment. As used herein, “indicative of a positive treatment outcome” refers to an increased likelihood that the patient will experience beneficial results from the selected treatment (e.g., complete or partial remission, reduced tumor size, etc.). By “indicative of a negative treatment outcome” is intended an increased likelihood that the patient will not benefit from the selected treatment with respect to the progression of the underlying breast cancer. In some aspects of the invention, the selected treatment is chemotherapy.

In certain embodiments, methods for predicting the likelihood of survival of a breast cancer patient are provided. In particular, the methods may be used predict the likelihood of long-term, disease-free survival. By “predicting the likelihood of survival of a breast cancer patient” is intended assessing the risk that a patient will die as a result of the underlying breast cancer. “Long-term, disease-free survival” is intended to mean that the patient does not die from or suffer a recurrence of the underlying breast cancer within a period of at least five years, more particularly at least ten or more years, following initial diagnosis or treatment. Such methods for predicting the likelihood of survival of a breast cancer patient comprise detecting expression of multiple biomarkers in a patient sample, wherein the likelihood of survival, particularly long-term, disease-free survival, decreases as the number of biomarkers determined to be overexpressed in the patient sample increases. For example, in one aspect of the invention, the expression of at least five biomarkers is determined, wherein overexpression of none of the biomarkers is indicative of an increased likelihood of survival, and wherein overexpression of two or more biomarkers is indicative of a decreased likelihood of survival. Likelihood of survival may be assessed in comparison to, for example, breast cancer survival statistics available in the art. In other embodiments, methods for predicting the likelihood of survival of breast cancer patient comprise determining the expression of at least six biomarkers and assessing the number of these biomarkers that are overexpressed. Biomarkers useful for these methods may be selected from, for example, E2F1, SLPI, MUC-1, src, p21ras, and PSMB9. See generally examples 8 and 9.

The biomarkers of the invention include genes and proteins. Such biomarkers include DNA comprising the entire or partial sequence of the nucleic acid sequence encoding the biomarker, or the complement of such a sequence. The biomarker nucleic acids also include RNA comprising the entire or partial sequence of any of the nucleic acid sequences of interest. A biomarker protein is a protein encoded by or corresponding to a DNA biomarker of the invention. A biomarker protein comprises the entire or partial amino acid sequence of any of the biomarker proteins or polypeptides. Fragments and variants of biomarker genes and proteins are also encompassed by the present invention. By “fragment” is intended a portion of the polynucleotide or a portion of the amino acid sequence and hence protein encoded thereby. Polynucleotides that are fragments of a biomarker nucleotide sequence generally comprise at least 10, 15, 20, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 800, 900, 1,000, 1,100, 1,200, 1,300, or 1,400 contiguous nucleotides, or up to the number of nucleotides present in a full-length biomarker polynucleotide disclosed herein. A fragment of a biomarker polynucleotide will generally encode at least 15, 25, 30, 50, 100, 150, 200, or 250 contiguous amino acids, or up to the total number of amino acids present in a full-length biomarker protein of the invention. “Variant” is intended to mean substantially similar sequences. Generally, variants of a particular biomarker of the invention will have at least about 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to that biomarker as determined by sequence alignment programs.

A “biomarker” is any gene or protein whose level of expression in a tissue or cell is altered compared to that of a normal or healthy cell or tissue. The biomarkers of the present invention are genes and proteins whose overexpression correlates with cancer, particularly breast cancer, prognosis. In particular embodiments, selective overexpression of a biomarker or combination of biomarkers of interest in a patient sample is indicative of a poor cancer prognosis. By “indicative of a poor prognosis” is intended that overexpression of the particular biomarker or combination of biomarkers is associated with an increased likelihood of relapse or recurrence of the underlying cancer or tumor, metastasis, or death, as defined herein above. For example, “indicative of a poor prognosis” may refer to an increased likelihood of relapse or recurrence of the underlying cancer or tumor, metastasis, or death within five years, more particularly ten years. Biomarkers that are indicative of a poor prognosis may be referred to herein as “bad outcome biomarkers.” In other aspects of the invention, the absence of overexpression of a biomarker or combination of biomarkers of interest is indicative of a good prognosis. As used herein, “indicative of a good prognosis” refers to an increased likelihood that the patient will remain cancer-free, as defined herein above. In some embodiments, “indicative of a good prognosis” refers to an increased likelihood that the patient will remain cancer-free for at least five, more particularly at least ten years. Such biomarkers may be referred to as “good outcome biomarkers.”

The biomarkers of the present invention include any gene or protein whose overexpression correlates with breast cancer prognosis, as described herein above. Biomarkers include genes and proteins that are indicative of a poor breast cancer prognosis (i.e., bad outcome biomarkers) as well as those that are indicative of a good prognosis (i.e., good outcome biomarkers). Biomarkers of particular interest include genes and proteins that are involved in regulation of cell growth and proliferation, cell cycle control, DNA replication and transcription, apoptosis, signal transduction, angiogenesis/lymphogenesis, or metastasis. In some embodiments, the biomarkers regulate protease systems involved in tissue remodeling, extracellular matrix degradation, and adjacent tissue invasion. Although any biomarker whose overexpression is indicative of breast cancer prognosis can be used to practice the invention, in particular embodiments, biomarkers are selected from the group consisting of SLPI, p21ras, MUC-1, DARPP-32, phospho-p27, src, MGC 14832, myc, TGFβ-3, SERHL, E2F1, PDGFRα, NDRG-1, MCM2, PSMB9, MCM6, and p53. See Table 43. In one embodiment, the biomarkers of interest comprise SLPI, PSMB9, phospho-p27, src, E2F1, p21ras, or p53. In one aspect of the invention, the methods for evaluating breast cancer prognosis comprise detecting the expression of E2F1 and SLPI, wherein overexpression of at least one of these biomarkers is indicative of a poor prognosis. In another embodiment, the methods comprise detecting the expression of E2F1, src, and SLPI, wherein overexpression of at least two of the biomarkers is indicative of a poor breast cancer prognosis. In a further embodiment, the methods of the present invention comprise detecting the expression of E2F1, src, PSMB9, and SLPI, wherein overexpression of at least two of these biomarkers is indicative of a poor breast cancer prognosis. In other aspects of the invention, the expression of E2F1, SLPI, PSMB9, p21ras, and src is detected, and overexpression of at least two of these biomarkers is indicative of a poor prognosis. In yet another embodiment, the methods comprise detecting the expression of SLPI, p21ras, E2F1, PSMB9, phospho-p27, and src in a patient sample, wherein overexpression of at least four of these biomarkers is indicative of a poor prognosis.

In another embodiment, the biomarkers of interest comprise E2F1, SLPI, MUC-1, src, p21ras, and PSMB9. In one aspect of the invention, the methods for evaluating breast cancer prognosis comprise detecting the expression of E2F1 and SLPI, wherein overexpression of at least one of these biomarkers is indicative of a poor prognosis. In another embodiment, the methods comprise detecting the expression of E2F1, SLPI, and PSMB9, wherein overexpression of at least two of the biomarkers is indicative of a poor breast cancer prognosis. In a further embodiment, the methods of the present invention comprise detecting the expression of E2F1, SLPI, MUC-1, and src, wherein overexpression of at least two of these biomarkers is indicative of a poor breast cancer prognosis. In other aspects of the invention, the expression of E2F1, SLPI, MUC-1, src, and p21ras is detected, and overexpression of at least two of these biomarkers is indicative of a poor prognosis. In yet another embodiment, the methods comprise detecting the expression of E2F1, SLPI, MUC-1, src, p21ras, and PSMB9 in a patient sample, wherein overexpression of at least four of these biomarkers is indicative of a poor prognosis.

Secretory Leukocyte Protease Inhibitor (SLPI) is a non-specific inhibitor that can inactivate a number of proteases including leukocyte elastase, trypsin, chymotrypsin and the cathepsins (e.g., cathepsin G). SLPI is known to be involved in inflammation and the inflammatory response in relation to tissue repair. Protease inhibitors have generally been considered to counteract tumor progression and metastasis. However, expression of serine protease inhibitors (SPI's) in tumors is often associated with poor prognosis of cancer patients. Cathepsin G is over expressed in breast cancer and is an indicator of poor prognosis. Its inhibitory effect contributes to the immune response by protecting epithelial surfaces from attack by endogenous proteolytic enzymes. The gene location for SLPI is 20q12, which is a chromosomal region implicated in breast cancer chromosomal alterations and aneuploidy.

PSMB9 is a member of the proteasome B-type family, also known as the T1B family, that is a 20S core beta subunit. This gene is located in the class II region of the MHC (major histocompatibility complex). Expression of this gene is induced by gamma interferon, and this gene product replaces catalytic subunit 1 (proteasome beta 6 subunit) in the immunoproteasome. Proteolytic processing is required to generate a mature subunit.

NDRG-1 (N-Myc downstream regulated) is upregulated during cell differentiation, repressed by N-myc and c-myc in embryonic cells, and suppressed in several tumor cells. Overexpression may be related to hypoxia and the subsequent signaling to induce angiogenesis. Hypoxia causes the accumulation of the transcription factor hypoxia-inducible factor 1 (HIF-1), culminating in the expression of hypoxia-inducible genes such as those for vascular endothelial growth factor (VEGF) and NDRG-1. NDRG-1 is found in some breast cancers as an overexpressed mRNA. NDRG-1 is located on chromosome 8q24 adjacent to the c-myc gene.

MUC1 is a heavily O-glycosylated transmembrane protein expressed on most secretory epithelium, including mammary glands and some hematopoietic cells. It is expressed abundantly in lactating mammary glands and overexpressed in more than 90% of breast carcinomas and metastases. In normal mammary glands, it is expressed on the apical surface of glandular epithelium.

p27 is a key regulator of the cell cycle and participates in the G1-to-S phase progression. It interacts specifically with the cyclin E/cdk2 complex during G1 phase and also with D-type cyclin-cdks. p27 can be phosphorylated on threonine 187 by Cdks. Phosphorylation of p27 at threonine 187 is also cell-cycle dependent, present in proliferating cells but undetectable in G1 cells. Activation of p27 degradation is seen in proliferating cells and in many types of aggressive human carcinomas. Overexpression of p27 may lead to an inhibition of apoptosis and resistance to some chemotherapy.

The Src family of protein tyrosine kinases (including Src, Lyn, Fyn, Yes, Lck, Blk, Hck, etc.) is important in the regulation of growth and differentiation of eukaryotic cells. Src activity is regulated by tyrosine phosphorylation at two sites with opposing effects. Phosphorylation of Tyr416 in the activation loop of the kinase domain upregulates the enzyme. Phosphorylation of Tyr527 in the C-terminal tail by Csk renders the enzyme less active.

E2F1 is a member of a family of transcription factors involved in the regulation of both G1 and S phase cyclins, in particular cyclin D1. These proteins participate in the Rb pathway of cell-cycle regulation and control of DNA synthesis. During the G1 phase of the cell-cycle, the E2F transcription factors are bound in an inactive complex with the Rb tumor suppressor protein. During the G1/S boundary of the cell cycle, the Rb protein is hyperphosphorylated and releases the E2F transcription factor from its inhibitory complex. The E2F transcription factor then activates transcription for those genes responsible for the S-phase of the cell-cycle, predominantly resulting in initiation of DNA synthesis and preparation for mitosis and subsequent cell division. Overexpression of E2F1 has been shown to lead to the induction of apoptosis possibly through the inhibition of cyclinD1-dependent kinase activity coupled with the induction of a p 16 related transcript. In addition, regulation of E2F1 at the level of transcription, E2F1 protein levels are also controlled by the ubiquitin-proteosome dependent degradation pathway. Ubiquitination is blocked by the Rb and E2F1 complex, which directly controls aspects of cell cycle progression.

p21ras is a member of a large group of cytoplasmic proteins involved in signal transduction. Guanine nucleotide binding proteins (G proteins) comprise a large group of cytoplasmic proteins present in eukaryotic cells that are involved in signal transduction. There are two forms, the large heterotrimeric G proteins and the smaller monomers. The 3 ras oncogenes, H-ras, K-ras, and N-ras are members of the smaller monomeric G proteins and are located on chromosomes 11, 12 and 1 respectively. They encode 21-kD proteins called p21s and contain 188 amino acids. p21 ras proteins are involved in normal cell growth, protease activities, and cell adhesion.

Collectively, the three forms of p21ras function by linking ligand-mediated extracellular receptor activation with intracellular tyrosine kinase activation and subsequent initiation of a number of cellular processes relevant to breast cancer progression, including DNA replication, proliferation, and anchorage independent growth. The K- and H-ras genes are most often implicated in breast cancer. In both of these ras genes, mutations at codons 12 and 13 are common. These gain-of-function mutations result in constitutive activation that uncouples the normal ligand-induced signal transduction within the ras signaling pathways. Less common in breast cancer is the involvement of N-ras. Two mechanisms have been reported for N-ras associated changes in breast cancer: mutation at codon 61 resulting in constitutive activation of the oncogene, similar to the mutations mentioned above for K- and H-ras, and chromosomal amplification. Moreover, in addition to activation of intracellular signaling pathways, the ras oncogenes have been reported to induce overexpression of proteases important for tissue remodeling and invasion. H-ras has been implicated in matrix metalloprotease-2 (MMP-2) overexpression, and N-ras has been associated with overexpression of MMP-9. See generally Correll and Zoll (1988) Human Genetics 79:225-259; Tong et al. (1989) Nature 337:90-93; Watson et al. (1991) Breast Cancer Res. Treat. 17:161-169; Dati et al. (1991) Int. J. Cancer 47:833-838; Archer et al. (1995) Br. J. Cancer 72:1259-1266; Bland et al. (1995) Ann. Surg. 221:706-718; Shackney et al. (1998) Clin. Cancer Res. 4:913-928; and Gohring et al. (1999) Tumor Biol. 20:173-183, all of which are herein incorporated by reference in their entirety. Detection of any form (i.e., H-, K-, N-ras) of the p21ras proteins is encompassed by the present invention.

Minichromosome maintenance (MCM) proteins play an essential part in eukaryotic DNA replication. Each of the MCM proteins has DNA-dependent ATPase motifs in their highly conserved central domain. Levels of MCM proteins generally increase in a variable manner as normal cells progress from G0 into the G1/S phase of the cell cycle. In the G0 phase, MCM2 and MCM5 proteins are much less abundant than are the MCM7 and MCM3 proteins. MCM6 forms a complex with MCM2, MCM4, and MCM7, which binds histone H3. In addition, the subcomplex of MCM4, MCM6, and MCM7 has helicase activity, which is mediated by the ATP-binding activity of MCM6 and the DNA-binding activity of MCM4. See, for example, Freeman et al. (1999) Clin. Cancer Res. 5:2121-2132; Lei et al. (2001) J. Cell Sci. 114:1447-1454; Ishimi et al. (2003) Eur. J. Biochem. 270:1089-1101, all of which are herein incorporated by reference in their entirety.

DARPP32 is an inhibitor of protein phosphatase 1 whose biological function and inhibitory activity are modulated through specific amino acid residue phosphorylation within the DARPP32 protein. Threonine 34 (T34) phosphorylation renders the DARPP32 protein a specific protein phosphatase 1 inhibitor. However, threonine 75 (T75) phosphorylation renders the DARPP32 an inhibitor of protein kinase A (PKA). The gene location for DARPP32 is 17q21.2, which is known to be adjacent to the her2/neu (c-erb-B2 receptor tyrosine kinase) gene at 17q12. This region has been implicated in breast cancer chromosomal amplifications and resultant poor outcome within 25-35% of breast cancers. Several publications have demonstrated specific transcriptional activation of this 17q12-21 amplicon in breast cancer, with a number of genes located within this amplicon being overexpressed.

p53 plays multiple roles in cells. Expression of high levels of wild-type, but not mutant, p53 has two outcomes: cell cycle arrest or apoptosis. The observation that DNA-damaging agents induce levels of p53 in cells led to the definition of p53 as a checkpoint factor, akin perhaps to the product of the fad9 gene in yeast. While dispensable for viability, in response to genotoxic stress p53 acts as an “emergency brake” inducing either arrest or apoptosis, protecting the genome from accumulating excess mutations. Consistent with this notion, cells lacking p53 have been shown to be genetically unstable and, thus, more prone to tumors. The p53 protein is located in the nucleus of cells and is very labile. p53 is mutated in roughly 50% of all human tumors, predominantly in the DNA-binding domain codons.

Although the above biomarkers have been discussed in detail, any biomarker whose overexpression is indicative of breast cancer prognosis can be used to practice the invention, including biomarkers not yet identified in the art. Such biomarkers include genes and proteins that are, for example, involved in cell proliferation, cell cycle control, or the generalized mechanisms of cancer motility and invasion. Biomarkers of potential interest include cyclooxygenase-2 (cox-2), rhoC, c-myc, urokinase plasminogen activator receptor (uPAR), Wilms' tumor suppressor, akt kinase, and osteopontin. See, for example, Perou et al. (2000) Nature 406:747-752; Sorlie et al. (2001) Proc. Natl. Acad. Sci. 98:10869-10874; Van't Veer et al. (2002) Nature 415:530-536; Huang et al. (2003) Lancet 361:1590-1596, all of which are herein incorporated by reference in their entirety.

In particular embodiments, the biomarkers are kinases that are involved in signal transduction pathways, such as PI3K regulatory a, LTk, Ser/thr kinase 15, MAPK8IPI, MAPKAPK2, and PK428, PRKR. Growth factors, extracellular signal transduction proteins, and extracellular matrix proteins are also biomarkers of interest. Such proteins include EGFR, TNF receptor associated factor 4, GFR bound protein 7, ErbB2 (her 2), VEGF, GDF1, IGFBP5, EGF8 ras homolog, MMP 9, MMP 7, SLPI, keratin 5, keratin 17, laminin gamma 2 (laminin V), troponin, and tubulin.

In some aspects of the invention, the biomarkers comprise genes and proteins that are involved in chromosome condensation and maintenance, such as, for example, Cc related, HMG non-histone chromosomal 11, MMD5, MCM5, MCM6, and Swi/snf related actin. Biomarkers that are associated with centromere and centrosome function, including CENPA, CENPF, CENPE, Bub 1, polo-like kinase, and HsEg5, MCAK, and HSET, can also be used in the methods described herein. The biomarkers of the invention may also comprise transcription factors, particularly those associated with cell cycle regulation. Transcription factors of interest include but are not limited to E2F1, E2F4, NDRG-1, ORC6L, PCNA, nuclear factor 1, EZH2, and TFAP2A. Cyclins, such as CDC20, CDC 25B, cyclin A2, cyclin E, and cyclin F, may also be used to practice the disclosed methods.

Although the methods of the invention require the detection of at least one, more particularly at least two, biomarker(s) in a patient sample for evaluating breast cancer prognosis, 3, 4, 5, 6, 7, 8, 9, 10 or more biomarkers may be used to practice the present invention. It is recognized that detection of more than one biomarker in a body sample may be used to evaluate cancer, particularly breast cancer, prognosis. Therefore, in some embodiments, two or more biomarkers are used, more preferably, two or more complementary biomarkers. By “complementary” is intended that detection of the combination of biomarkers in a body sample results in the accurate determination of cancer prognosis in a greater percentage of cases than would be identified if only one of the biomarkers was used. Thus, in some cases, a more accurate determination of cancer prognosis can be made by using at least two biomarkers. Accordingly, where at least two biomarker proteins are used, at least two antibodies directed to distinct biomarker proteins will be used to practice the immunohistochemistry methods disclosed herein. The antibodies may be contacted with the body sample simultaneously or successively.

When a combination of two or more biomarkers is used, the biomarkers will typically be substantially statistically independent of one another. By “statistically independent” biomarkers is intended that the prognoses generated therefrom are independent such that one biomarker does not provide substantially repetitive information with regard to the complementary biomarker. This may ensure, for instance, that a biomarker is not used in conjunction with a first biomarker when the two are not substantially statistically independent. The dependence of the two biomarkers may indicate that they are duplicative and that the addition of a second biomarker adds no additional value to the prognostic power of a given pair of biomarkers. In order to optimize the prognostic power of a given panel of biomarkers it is also desirable to reduce the amount of signal “noise” by minimizing the use of biomarkers that provide duplicative prognostic information when compared to another biomarker in the panel. Methods for determining statistical independence are known in the art. Statistical independence of biomarkers of interest can be assessed using any method, including, for example, the methods disclosed in U.S. Application Publication No. 2006/0078926 entitled “Methods and Computer Programs for Analysis and Optimization of Marker Candidates for Cancer Prognosis,” filed Sep. 22, 2005 and incorporated by reference in its entirety. Where independent, prognostic biomarkers are used to practice the present methods, the prognostic value is increased by detecting the expression of 2, 3, 4, 5, 6, 7 or more biomarkers. In such cases, any combination of independent biomarkers can be used.

One of skill in the art will also recognize that a panel of biomarkers can be used to evaluate the prognosis of a breast cancer patient in accordance with the methods of the invention. In some embodiments, a panel comprising at least two biomarkers selected from the group consisting of SLPI, p21ras, MUC-1, DARPP-32, phospho-p27, src, MGC 14832, myc, TGFβ-3, SERHL, E2F1, PDGFRα, NDRG-1, MCM2, PSMB9, MCM6, and p53 is provided. One particular panel of biomarkers may comprise, for example, all or a subset of E2F1, SLPI, MUC-1, src, p21ras, and PSMB9. A panel of biomarkers may comprise any number or combination of biomarkers of interest. In certain aspects of the invention, a panel comprises at least two statistically independent, prognostic biomarkers.

In particular embodiments, the methods for evaluating breast cancer prognosis comprise collecting a patient body sample, preferably a breast tissue sample, more preferably a primary breast tumor tissue sample, contacting the sample with at least one antibody specific for a biomarker of interest, detecting antibody binding, and determining if the biomarker is overexpressed. That is, samples are incubated with the biomarker antibody for a time sufficient to permit the formation of antibody-antigen complexes, and antibody binding is detected, for example, by a labeled secondary antibody. Samples that exhibit overexpression of at least one bad outcome biomarker, as determined by antibody binding, are classified as having a poor prognosis. Similarly, patient samples that display overexpression of at least one good outcome biomarker are categorized as having a good prognosis. Furthermore, the overexpression of certain combinations of biomarkers of interest is specifically used to distinguish breast cancer patients with a poor prognosis from those with a good prognosis. In some aspects of the invention, the methods comprise detecting the expression of two or more biomarkers in a patient sample and determining if said biomarkers are overexpressed, wherein overexpression of all or some subset of these biomarkers is indicative of breast cancer prognosis. For example, in one embodiment, the methods comprise detecting the expression of SLPI, p21ras, E2F1, PSMB9, phospho-p27, and src, wherein overexpression of at least four of these biomarkers is indicative of a poor prognosis. In another aspect of the invention, the methods comprise detecting the expression of SLPI, E2F1, and src, wherein overexpression of at least two of these biomarkers is indicative of a poor prognosis. In other embodiments, the methods comprise detecting the expression of E2F1, SLPI, MUC-1, src, p21ras, and PSMB9, wherein overexpression of at least four of these biomarkers is indicative of a poor prognosis. In another aspect of the invention, the methods comprise detecting the expression of SLPI, E2F1, and MUC-1, wherein overexpression of at least two of these biomarkers is indicative of a poor prognosis.

By “body sample” is intended any sampling of cells, tissues, or bodily fluids in which expression of a biomarker can be detected. Examples of such body samples include but are not limited to blood, lymph, urine, gynecological fluids, biopsies, and smears. Bodily fluids useful in the present invention include blood, urine, saliva, nipple aspirates, or any other bodily secretion or derivative thereof. Blood can include whole blood, plasma, serum, or any derivative of blood. In preferred embodiments, the body sample comprises breast cells, particularly breast tissue from a biopsy, more particularly a breast tumor tissue sample. Body samples may be obtained from a patient by a variety of techniques including, for example, by scraping or swabbing an area, by using a needle to aspirate bodily fluids, or by removing a tissue sample (i.e., biopsy). Methods for collecting various body samples are well known in the art. In some embodiments, a breast tissue sample is obtained by, for example, fine needle aspiration biopsy, core needle biopsy, or excisional biopsy. Fixative and staining solutions may be applied to the cells or tissues for preserving the specimen and for facilitating examination. Body samples, particularly breast tissue samples, may be transferred to a glass slide for viewing under magnification. In preferred embodiments, the body sample is a formalin-fixed, paraffin-embedded breast tissue sample, particularly a primary breast tumor sample.

Any methods available in the art for detecting expression of biomarkers are encompassed herein. The expression of a biomarker of the invention can be detected on a nucleic acid level or a protein level. By “detecting expression” is intended determining the quantity or presence of a biomarker gene or protein. Thus, “detecting expression” encompasses instances where a biomarker is determined not to be expressed, not to be detectably expressed, expressed at a low level, expressed at a normal level, or overexpressed. In order to determine overexpression, the body sample to be examined may be compared with a corresponding body sample that originates from a healthy person. That is, the “normal” level of expression is the level of expression of the biomarker in, for example, a breast tissue sample from a human subject or patient not afflicted with breast cancer. Such a sample can be present in standardized form. In some embodiments, determination of biomarker overexpression requires no comparison between the body sample and a corresponding body sample that originates from a healthy person. For example, detection of overexpression of a biomarker indicative of a poor prognosis in a breast tumor sample may preclude the need for comparison to a corresponding breast tissue sample that originates from a healthy person. Moreover, in some aspects of the invention, no expression, underexpression, or normal expression (i.e., the absence of overexpression) of a biomarker or combination of biomarkers of interest provides useful information regarding the prognosis of a breast cancer patient.

Methods for detecting expression of the biomarkers of the invention comprise any methods that determine the quantity or the presence of the biomarkers either at the nucleic acid or protein level. Such methods are well known in the art and include but are not limited to western blots, northern blots, southern blots, ELISA, immunoprecipitation, immunofluorescence, flow cytometry, immunohistochemistry, nucleic acid hybridization techniques, nucleic acid reverse transcription methods, and nucleic acid amplification methods. In particular embodiments, expression of a biomarker is detected on a protein level using, for example, antibodies that are directed against specific biomarker proteins. These antibodies can be used in various methods such as Western blot, ELISA, immunoprecipitation, or immunohistochemistry techniques. Likewise, immunostaining of breast tissue, particularly breast tumor tissue, can be combined with assessment of clinical information, conventional prognostic methods, and expression of molecular markers (e.g., Her2/neu, Ki67, p53, and hormone receptor status) known in the art. In this manner, the disclosed methods may permit the more accurate determination of breast cancer prognosis.

In one embodiment, antibodies specific for biomarker proteins are utilized to detect the expression of a biomarker protein in a body sample. The method comprises obtaining a body sample from a patient, contacting the body sample with at least one antibody directed to SLPI, p21ras, MUC-1, DARPP-32, phospho-p27, src, MGC 14832, myc, TGFβ-3, SERHL, E2F1, PDGFRα, NDRG-1, MCM2, PSMB9, or MCM6, and detecting antibody binding to determine if the biomarker is overexpressed in the patient sample. Overexpression of the biomarker protein is indicative of prognosis, more particularly, a bad breast cancer prognosis. In other embodiments, the methods of the invention comprise detecting the expression of at least two biomarkers, wherein overexpression of at least one of the biomarkers is indicative of prognosis. Such methods may comprise the detection of multiple biomarkers in a patient sample wherein it is the overexpression of all or a subset of these biomarkers that is indicative of breast cancer prognosis.

One aspect of the present invention provides an immunohistochemistry technique for evaluating the prognosis of a breast cancer patient. Specifically, this method comprises antibody staining of biomarkers within a breast tissue sample, more particularly a breast tumor sample, that are indicative of prognosis. One of skill in the art will recognize that the immunohistochemistry methods described herein below may be performed manually or in an automated fashion using, for example, the Autostainer Universal Staining System (Dako). One protocol for antibody staining (i.e., immunohistochemistry) of breast tissue samples is provided in Example 1.

In one immunohistochemistry method, a patient breast tissue sample is collected by, for example, biopsy techniques known in the art. Samples may be frozen for later preparation or immediately placed in a fixative solution. Tissue samples may be fixed by treatment with a reagent such as formalin, gluteraldehyde, methanol, or the like and embedded in paraffin. Methods for preparing slides for immunohistochemical analysis from formalin-fixed, paraffin-embedded tissue samples are well known in the art. In some embodiments, particularly the immunohistochemistry methods of the invention, samples may need to be modified in order to make the biomarker antigens accessible to antibody binding. For example, formalin fixation of tissue samples results in extensive cross-linking of proteins that can lead to the masking or destruction of antigen sites and, subsequently, poor antibody staining. As used herein, “antigen retrieval” or “antigen unmaksing” refers to methods for increasing antigen accessibility or recovering antigenicity in, for example, formalin-fixed, paraffin-embedded tissue samples. Any method for making antigens more accessible for antibody binding may be used in the practice of the invention, including those antigen retrieval methods known in the art. See, for example, Hanausek and Walaszek, eds. (1998) Tumor Marker Protocols (Humana Press, Inc., Totowa, N.J.); and Shi et al., eds. (2000) Antigen Retrieval Techniques: Immunohistochemistry and Molecular Morphology (Eaton Publishing, Natick, Mass.), both of which are herein incorporated by reference in their entirety.

Antigen retrieval methods include but are not limited to treatment with proteolytic enzymes (e.g., trypsin, chymoptrypsin, pepsin, pronase, etc.) or antigen retrieval solutions. Antigen retrieval solutions of interest include, for example, citrate buffer, pH 6.0 (Dako), tris buffer, pH 9.5 (Biocare), EDTA, pH 8.0 (Biocare), L.A.B. (“Liberate Antibody Binding Solution;” Polysciences), antigen retrieval Glyca solution (Biogenex), citrate buffer solution, pH 4.0 (Zymed), Dawn® detergent (Proctor & Gamble), deionized water, and 2% glacial acetic acid. In some embodiments, antigen retrieval comprises applying the antigen retrieval solution to a formalin-fixed tissue sample and then heating the sample in an oven (e.g., 60° C.), steamer (e.g., 95° C.), or pressure cooker (e.g., 120° C.) at specified temperatures for defined time periods. In other aspects of the invention, antigen retrieval may be performed at room temperature. Incubation times will vary with the particular antigen retrieval solution selected and with the incubation temperature. For example, an antigen retrieval solution may be applied to a sample for as little as 5, 10, 20, or 30 minutes or up to overnight. The design of assays to determine the appropriate antigen retrieval solution and optimal incubation times and temperatures is standard and well within the routine capabilities of those of ordinary skill in the art.

Following antigen retrieval, samples are blocked using an appropriate blocking agent, e.g., hydrogen peroxide. An antibody directed to a biomarker of interest is then incubated with the sample for a time sufficient to permit antigen-antibody binding. As noted above, one of skill in the art will appreciate that a more accurate breast cancer prognosis may be obtained in some cases by detecting overexpression of more than one biomarker in a patient sample. Therefore, in particular embodiments, at least two antibodies directed to two distinct biomarkers are used to evaluate the prognosis of a breast cancer patient. Where more than one antibody is used, these antibodies may be added to a single sample sequentially as individual antibody reagents or simultaneously as an antibody cocktail. Alternatively, each individual antibody may be added to a separate tissue section from a single patient sample, and the resulting data pooled.

Techniques for detecting antibody binding are well known in the art. Antibody binding to a biomarker of interest may be detected through the use of chemical reagents that generate a detectable signal that corresponds to the level of antibody binding and, accordingly, to the level of biomarker protein expression. For example, antibody binding can be detected through the use of a secondary antibody that is conjugated to a labeled polymer. Examples of labeled polymers include but are not limited to polymer-enzyme conjugates. The enzymes in these complexes are typically used to catalyze the deposition of a chromogen at the antigen-antibody binding site, thereby resulting in cell staining that corresponds to expression level of the biomarker of interest. Enzymes of particular interest include horseradish peroxidase (HRP) and alkaline phosphatase (AP). Commercial antibody detection systems, such as, for example the Dako Envision+ system and Biocare Medical's Mach 3 system, may be used to practice the present invention.

In one immunohistochemistry method of the invention, antibody binding to a biomarker is detected through the use of an HRP-labeled polymer that is conjugated to a secondary antibody. Slides are stained for antibody binding using the chromogen 3,3-diaminobenzidine (DAB) and then counterstained with hematoxylin and, optionally, a bluing agent such as ammonium hydroxide. In some aspects of the invention, slides are reviewed microscopically by a pathologist to assess cell staining (i.e., biomarker overexpression) and to evaluate breast cancer prognosis. Alternatively, samples may be reviewed via automated microscopy or by personnel with the assistance of computer software that facilitates the identification of positive staining cells.

The terms “antibody” and “antibodies” broadly encompass naturally occurring forms of antibodies and recombinant antibodies such as single-chain antibodies, chimeric and humanized antibodies and multi-specific antibodies as well as fragments and derivatives of all of the foregoing, which fragments and derivatives have at least an antigenic binding site. Antibody derivatives may comprise a protein or chemical moiety conjugated to the antibody.

“Antibodies” and “immunoglobulins” (Igs) are glycoproteins having the same structural characteristics. While antibodies exhibit binding specificity to an antigen, immunoglobulins include both antibodies and other antibody-like molecules that lack antigen specificity. Polypeptides of the latter kind are, for example, produced at low levels by the lymph system and at increased levels by myelomas.

The term “antibody” is used in the broadest sense and covers fully assembled antibodies, antibody fragments that can bind antigen (e.g., Fab′, F′(ab)₂, Fv, single chain antibodies, diabodies), and recombinant peptides comprising the foregoing.

The term “monoclonal antibody” as used herein refers to an antibody obtained from a population of substantially homogeneous antibodies, i.e., the individual antibodies comprising the population are identical except for possible naturally-occurring mutations that may be present in minor amounts.

“Antibody fragments” comprise a portion of an intact antibody, preferably the antigen-binding or variable region of the intact antibody. Examples of antibody fragments include Fab, Fab′, F(ab′)2, and Fv fragments; diabodies; linear antibodies (Zapata et al. (1995) Protein Eng. 8(10):1057-1062); single-chain antibody molecules; and multispecific antibodies formed from antibody fragments. Papain digestion of antibodies produces two identical antigen-binding fragments, called “Fab” fragments, each with a single antigen-binding site, and a residual “Fc” fragment, whose name reflects its ability to crystallize 35 readily. Pepsin treatment yields an F(ab′)2 fragment that has two antigen-combining sites and is still capable of cross-linking antigen.

“Fv” is the minimum antibody fragment that contains a complete antigen recognition and binding site. In a two-chain Fv species, this region consists of a dimer of one heavy- and one light-chain variable domain in tight, non-covalent association. In a single-chain Fv species, one heavy- and one light-chain variable domain can be covalently linked by flexible peptide linker such that the light and heavy chains can associate in a “dimeric” structure analogous to that in a two-chain Fv species. It is in this configuration that the three CDRs of each variable domain interact to define an antigen-binding site on the surface of the V_(H)-V_(L) dimer. Collectively, the six CDRs confer antigen-binding specificity to the antibody. However, even a single variable domain (or half of an Fv comprising only three CDRs specific for an antigen) has the ability to recognize and bind antigen, although at a lower affinity than the entire binding site.

The Fab fragment also contains the constant domain of the light chain and the first constant domain (C_(H)1) of the heavy chain. Fab fragments differ from Fab′ fragments by the addition of a few residues at the carboxy terminus of the heavy-chain C_(H)1 domain including one or more cysteines from the antibody hinge region. Fab′-SH is the designation herein for Fab′ in which the cysteine residue(s) of the constant domains bear a free thiol group. F(ab′)2 antibody fragments originally were produced as pairs of Fab′ fragments that have hinge cysteines between them.

Monoclonal antibodies can be prepared using the method of Kohler et al. (1975) Nature 256:495-496, or a modification thereof. Typically, a mouse is immunized with a solution containing an antigen. Immunization can be performed by mixing or emulsifying the antigen-containing solution in saline, preferably in an adjuvant such as Freund's complete adjuvant, and injecting the mixture or emulsion parenterally. Any method of immunization known in the art may be used to obtain the monoclonal antibodies of the invention. After immunization of the animal, the spleen (and optionally, several large lymph nodes) are removed and dissociated into single cells. The spleen cells may be screened by applying a cell suspension to a plate or well coated with the antigen of interest. The B cells expressing membrane bound immunoglobulin specific for the antigen bind to the plate and are not rinsed away. Resulting B cells, or all dissociated spleen cells, are then induced to fuse with myeloma cells to form hybridomas, and are cultured in a selective medium. The resulting cells are plated by serial dilution and are assayed for the production of antibodies that specifically bind the antigen of interest (and that do not bind to unrelated antigens). The selected monoclonal antibody (mAb)-secreting hybridomas are then cultured either in vitro (e.g., in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mice).

As an alternative to the use of hybridomas, antibody can be produced in a cell line such as a CHO cell line, as disclosed in U.S. Pat. Nos. 5,545,403; 5,545,405; and 5,998,144; incorporated herein by reference. Briefly the cell line is transfected with vectors capable of expressing a light chain and a heavy chain, respectively. By transfecting the two proteins on separate vectors, chimeric antibodies can be produced. Another advantage is the correct glycosylation of the antibody. A monoclonal antibody can also be identified and isolated by screening a recombinant combinatorial immunoglobulin library (e.g., an antibody phage display library) with a biomarker protein to thereby isolate immunoglobulin library members that bind the biomarker protein. Kits for generating and screening phage display libraries are commercially available (e.g., the Pharmacia Recombinant Phage Antibody System, Catalog No. 27-9400-01; and the Stratagene SurfZAP θ Phage Display Kit, Catalog No. 240612). Additionally, examples of methods and reagents particularly amenable for use in generating and screening antibody display library can be found in, for example, U.S. Pat. No. 5,223,409; PCT Publication Nos. WO 92/18619; WO 91/17271; WO 92/20791; WO 92/15679; 93/01288; WO 92/01047; 92/09690; and 90/02809; Fuchs et al. (1991) Bio/Technology 9:1370-1372; Hay et al. (1992) Hum. Antibod. Hybridomas 3:81-85; Huse et al. (1989) Science 246:1275-1281; Griffiths et al. (1993) EMBO J. 12:725-734.

Polyclonal antibodies can be prepared by immunizing a suitable subject (e.g., rabbit, goat, mouse, or other mammal) with a biomarker protein immunogen. The antibody titer in the immunized subject can be monitored over time by standard techniques, such as with an enzyme linked immunosorbent assay (ELISA) using immobilized biomarker protein. At an appropriate time after immunization, e.g., when the antibody titers are highest, antibody-producing cells can be obtained from the subject and used to prepare monoclonal antibodies by standard techniques, such as the hybridoma technique originally described by Kohler and Milstein (1975) Nature 256:495-497, the human B cell hybridoma technique (Kozbor et al. (1983) Immunol. Today 4:72), the EBV-hybridoma technique (Cole et al. (1985) in Monoclonal Antibodies and Cancer Therapy, ed. Reisfeld and Sell (Alan R. Liss, Inc., New York, N.Y.), pp. 77-96) or trioma techniques. The technology for producing hybridomas is well known (see generally Coligan et al., eds. (1994) Current Protocols in Immunology (John Wiley & Sons, Inc., New York, N.Y.); Galfre et al. (1977) Nature 266:55052; Kenneth (1980) in Monoclonal Antibodies: A New Dimension In Biological Analyses (Plenum Publishing Corp., NY; and Lerner (1981) Yale J. Biol. Med., 54:387-402).

The compositions of the invention further comprise monoclonal antibodies and variants and fragments thereof that specifically bind to biomarker proteins of interest. For example, monoclonal antibodies specific for SLPI (designated clone 5G6.24), DARPP-32 (8G11.20), MGC 14832 (1F3.9 and 2D1.14), NDRG-1 (10A9.34), PSMB9 (3A2.4), and MUC-1 (16E3.3) are provided. The monoclonal antibodies may be labeled with a detectable substance as described below to facilitate biomarker protein detection in the sample. Such antibodies find use in practicing the methods of the invention. Monoclonal antibodies having the binding characteristics of the antibodies disclosed herein are also encompassed by the present invention. Compositions further comprise antigen-binding variants and fragments of the monoclonal antibodies, hybridoma cell lines producing these antibodies, and isolated nucleic acid molecules encoding the amino acid sequences of these monoclonal antibodies.

Antibodies having the binding characteristics of a monoclonal antibody of the invention are also provided. “Binding characteristics” or “binding specificity” when used in reference to an antibody means that the antibody recognizes the same or similar antigenic epitope as a comparison antibody. Examples of such antibodies include, for example, an antibody that competes with a monoclonal antibody of the invention in a competitive binding assay. One of skill in the art could determine whether an antibody competitively interferes with another antibody using standard methods.

By “epitope” is intended the part of an antigenic molecule to which an antibody is produced and to which the antibody will bind. Epitopes can comprise linear amino acid residues (i.e., residues within the epitope are arranged sequentially one after another in a linear fashion), nonlinear amino acid residues (referred to herein as “nonlinear epitopes”; these epitopes are not arranged sequentially), or both linear and nonlinear amino acid residues. Typically epitopes are short amino acid sequences, e.g. about five amino acids in length. Systematic techniques for identifying epitopes are known in the art and are described, for example, in U.S. Pat. No. 4,708,871. Briefly, a set of overlapping oligopeptides derived from the antigen may be synthesized and bound to a solid phase array of pins, with a unique oligopeptide on each pin. The array of pins may comprise a 96-well microtiter plate, permitting one to assay all 96 oligopeptides simultaneously, e.g., for binding to a biomarker-specific monoclonal antibody. Alternatively, phage display peptide library kits (New England BioLabs) are currently commercially available for epitope mapping. Using these methods, the binding affinity for every possible subset of consecutive amino acids may be determined in order to identify the epitope that a given antibody binds. Epitopes may also be identified by inference when epitope length peptide sequences are used to immunize animals from which antibodies are obtained.

Antigen-binding fragments and variants of the monoclonal antibodies disclosed herein are further provided. Such variants will retain the desired binding properties of the parent antibody. Methods for making antibody fragments and variants are generally available in the art. For example, amino acid sequence variants of a monoclonal antibody described herein, can be prepared by mutations in the cloned DNA sequence encoding the antibody of interest. Methods for mutagenesis and nucleotide sequence alterations are well known in the art. See, for example, Walker and Gaastra, eds. (1983) Techniques in Molecular Biology (MacMillan Publishing Company, New York); Kunkel (1985) Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel et al. (1987) Methods Enzymol. 154:367-382; Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (Cold Spring Harbor, N.Y.); U.S. Pat. No. 4,873,192; and the references cited therein; herein incorporated by reference. Guidance as to appropriate amino acid substitutions that do not affect biological activity of the polypeptide of interest may be found in the model of Dayhoff et al. (1978) in Atlas of Protein Sequence and Structure (Natl. Biomed. Res. Found., Washington, D.C.), herein incorporated by reference. Conservative substitutions, such as exchanging one amino acid with another having similar properties, may be preferred. Examples of conservative substitutions include, but are not limited to, Gly

Ala, Val

Ile

Leu, Asp

Glu, Lys

Arg, Asn

Gln, and Phe

Trp

Tyr.

In constructing variants of the antibody polypeptide of interest, modifications are made such that variants continue to possess the desired activity, i.e., similar binding affinity to the biomarker. Obviously, any mutations made in the DNA encoding the variant polypeptide must not place the sequence out of reading frame and preferably will not create complementary regions that could produce secondary mRNA structure. See EP Patent Application Publication No. 75,444.

Preferably, variants of a reference biomarker antibody have amino acid sequences that have at least 70% or 75% sequence identity, preferably at least 80% or 85% sequence identity, more preferably at least 90%, 91%, 92%, 93%, 94% or 95% sequence identity to the amino acid sequence for the reference antibody molecule, or to a shorter portion of the reference antibody molecule. More preferably, the molecules share at least 96%, 97%, 98% or 99% sequence identity. For purposes of the present invention, percent sequence identity is determined using the Smith-Waterman homology search algorithm using an affine gap search with a gap open penalty of 12 and a gap extension penalty of 2, BLOSUM matrix of 62. The Smith-Waterman homology search algorithm is taught in Smith and Waterman (1981) Adv. Appl. Math. 2:482-489. A variant may, for example, differ from the reference antibody by as few as 1 to 15 amino acid residues, as few as 1 to 10 amino acid residues, such as 6-10, as few as 5, as few as 4, 3, 2, or even 1 amino acid residue.

With respect to optimal alignment of two amino acid sequences, the contiguous segment of the variant amino acid sequence may have additional amino acid residues or deleted amino acid residues with respect to the reference amino acid sequence. The contiguous segment used for comparison to the reference amino acid sequence will include at least 20 contiguous amino acid residues, and may be 30, 40, 50, or more amino acid residues. Corrections for sequence identity associated with conservative residue substitutions or gaps can be made (see Smith-Waterman homology search algorithm).

The antibodies used to practice the invention are selected to have specificity for the biomarker proteins of interest. Methods for making antibodies and for selecting appropriate antibodies are known in the art. See, for example, Celis, ed. (in press) Cell Biology & Laboratory Handbook, 3rd edition (Academic Press, New York), which is herein incorporated in its entirety by reference. In some embodiments, commercial antibodies directed to specific biomarker proteins may be used to practice the invention. The antibodies of the invention may be selected on the basis of desirable staining of histological samples. That is, in preferred embodiments the antibodies are selected with the end sample type (e.g., formalin-fixed, paraffin-embedded breast tumor tissue samples) in mind and for binding specificity.

In some aspects of the invention, antibodies directed to specific biomarkers of interest are selected and purified via a multi-step screening process. In particular embodiments, polydomas are screened to identify biomarker-specific antibodies that possess the desired traits of specificity and sensitivity. As used herein, “polydoma” refers to multiple hybridomas. The polydomas of the invention are typically provided in multi-well tissue culture plates. In the initial antibody screening step, a set of individual slides or tumor tissue microarrays comprising normal (i.e., non-cancerous) breast tissue and stage I, II, III, and IV breast tumor samples is used. Methods and equipment, such as the Chemicon® Advanced Tissue Arrayer, for generating arrays of multiple tissues on a single slide are known in the art. See, for example, U.S. Pat. No. 4,820,504. Undiluted supernatants from each well containing a polydoma are assayed for positive staining using standard immunohistochemistry techniques. At this initial screening step, background, non-specific binding is essentially ignored. Polydomas producing positive staining are selected and used in the second phase of antibody screening.

In the second screening step, the positive polydomas are subjected to a limiting dilution process. The resulting unscreened antibodies are assayed via standard immunohistochemistry techniques for positive staining of breast tumor tissue samples with known 5-year outcomes. To do this, tissue microarrays comprising normal breast tissue, early-stage breast tumor samples with known good 5-year outcomes, early-stage breast tumor samples with known bad 5-year outcomes, normal non-breast tissue, and cancerous non-breast tissue are generated. At this stage, background staining is relevant, and the candidate polydomas that stain positive for abnormal cells (i.e., cancer cells) only are selected for further analysis to identify antibodies that differentiate good and bad outcome patient samples.

Positive-staining cultures are prepared as individual clones in order to select individual candidate monoclonal antibodies. Methods for isolating individual clones and for purifying antibodies through affinity adsorption chromatography are well known in the art. Individual clones are further analyzed to determine the optimized antigen retrieval conditions and working dilution.

One of skill in the art will recognize that optimization of staining reagents and conditions, for example, antibody titer and detection chemistry parameters, is needed to maximize the signal to noise ratio for a particular antibody. Antibody concentrations that maximize specific binding to the biomarkers of the invention and minimize non-specific binding (or “background”) will be determined. In particular embodiments, appropriate antibody titers are determined by initially testing various antibody dilutions on formalin-fixed, paraffin-embedded normal and cancerous breast tissue samples. The design of assays to optimize antibody titer and detection conditions is standard and well within the routine capabilities of those of ordinary skill in the art. Some antibodies require additional optimization to reduce background staining and/or to increase specificity and sensitivity of staining.

Furthermore, one of skill in the art will recognize that the concentration of a particular antibody used to practice the methods of the invention will vary depending on such factors as time for binding, level of specificity of the antibody for the biomarker protein, and method of body sample preparation. Moreover, when multiple antibodies are used in a single sample, the required concentration may be affected by the order in which the antibodies are applied to the sample, i.e., simultaneously as a cocktail or sequentially as individual antibody reagents. Furthermore, the detection chemistry used to visualize antibody binding to a biomarker of interest must also be optimized to produce the desired signal to noise ratio. One example of optimization of staining reagents and conditions for immunohistochemistry is described in Example 6.

Detection of antibody binding can be facilitated by coupling the antibody to a detectable substance. Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, P-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; examples of bioluminescent materials include luciferase, luciferin, and aequorin; and examples of suitable radioactive material include ¹²⁵I, ¹³¹I, ³⁵S, or ³H.

In regard to detection of antibody staining in the immunohistochemistry methods of the invention, there also exist in the art, video-microscopy and software methods for the quantitative determination of an amount of multiple molecular species (e.g., biomarker proteins) in a biological sample wherein each molecular species present is indicated by a representative dye marker having a specific color. Such methods are also known in the art as a calorimetric analysis methods. In these methods, video-microscopy is used to provide an image of the biological sample after it has been stained to visually indicate the presence of a particular biomarker of interest. Some of these methods, such as those disclosed in U.S. patent application Ser. No. 09/957,446 to Marcelpoil et al. and U.S. patent application Ser. No. 10/057,729 to Marcelpoil et al., incorporated herein by reference, disclose the use of an imaging system and associated software to determine the relative amounts of each molecular species present based on the presence of representative color dye markers as indicated by those color dye markers' optical density or transmittance value, respectively, as determined by an imaging system and associated software. These techniques provide quantitative determinations of the relative amounts of each molecular species in a stained biological sample using a single video image that is “deconstructed” into its component color parts.

The methods of the invention can be used in conjunction with imaging systems and associated imaging software for the detection of biomarker expression. Biomarkers for use in the methods of the invention can be selected based on methods and computer programs such as those disclosed in U.S. Patent Application Publication No. 2006/0078926 entitled “Methods and Computer Programs for Analysis and Optimization of Marker Candidates for Cancer Prognosis,” filed Sep. 22, 2005, and incorporated by reference in its entirety. The methods disclosed therein can be used to develop algorithms for evaluating breast cancer prognosis.

In other embodiments, the expression of a biomarker of interest is detected at the nucleic acid level. Nucleic acid-based techniques for assessing expression are well known in the art and include, for example, determining the level of biomarker mRNA in a body sample. Many expression detection methods use isolated RNA. Any RNA isolation technique that does not select against the isolation of mRNA can be utilized for the purification of RNA (see, e.g., Ausubel et al., ed., Current Protocols in Molecular Biology, John Wiley & Sons, New York 1987-1999). Additionally, large numbers of tissue samples can readily be processed using techniques well known to those of skill in the art, such as, for example, the single-step RNA isolation process of Chomczynski (1989, U.S. Pat. No. 4,843,155).

The term “probe” refers to any molecule that is capable of selectively binding to a specifically intended target biomolecule, for example, a nucleotide transcript or a protein encoded by or corresponding to a biomarker. Probes can be synthesized by one of skill in the art, or derived from appropriate biological preparations. Probes may be specifically designed to be labeled. Examples of molecules that can be utilized as probes include, but are not limited to, RNA, DNA, proteins, antibodies, and organic molecules.

Isolated mRNA can be used in hybridization or amplification assays that include, but are not limited to, Southern or Northern analyses, polymerase chain reaction analyses and probe arrays. One method for the detection of mRNA levels involves contacting the isolated mRNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected. The nucleic acid probe can be, for example, a full-length cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to an mRNA or genomic DNA encoding a biomarker of the present invention. Hybridization of an mRNA with the probe indicates that the biomarker in question is being expressed.

In one embodiment, the mRNA is immobilized on a solid surface and contacted with a probe, for example by running the isolated mRNA on an agarose gel and transferring the mRNA from the gel to a membrane, such as nitrocellulose. In an alternative embodiment, the probe(s) are immobilized on a solid surface and the mRNA is contacted with the probe(s), for example, in an Affymetrix gene chip array. A skilled artisan can readily adapt known mRNA detection methods for use in detecting the level of mRNA encoded by the biomarkers of the present invention.

An alternative method for determining the level of biomarker mRNA in a sample involves the process of nucleic acid amplification, e.g., by RT-PCR (the experimental embodiment set forth in Mullis, 1987, U.S. Pat. No. 4,683,202), ligase chain reaction (Barany, 1991, Proc. Natl. Acad. Sci. USA, 88:189-193), self sustained sequence replication (Guatelli et al., 1990, Proc. Natl. Acad. Sci. USA 87:1874-1878), transcriptional amplification system (Kwoh et al., 1989, Proc. Natl. Acad. Sci. USA 86:1173-1177), Q-Beta Replicase (Lizardi et al., 1988, Bio/Technology 6:1197), rolling circle replication (Lizardi et al., U.S. Pat. No. 5,854,033) or any other nucleic acid amplification method, followed by the detection of the amplified molecules using techniques well known to those of skill in the art. These detection schemes are especially useful for the detection of nucleic acid molecules if such molecules are present in very low numbers. In particular aspects of the invention, biomarker expression is assessed by quantitative fluorogenic RT-PCR (i.e., the TaqMan® System).

Biomarker expression levels of RNA may be monitored using a membrane blot (such as used in hybridization analysis such as Northern, Southern, dot, and the like), or microwells, sample tubes, gels, beads or fibers (or any solid support comprising bound nucleic acids). See U.S. Pat. Nos. 5,770,722, 5,874,219, 5,744,305, 5,677,195 and 5,445,934, which are incorporated herein by reference. The detection of biomarker expression may also comprise using nucleic acid probes in solution.

In one embodiment of the invention, microarrays are used to detect biomarker expression. Microarrays are particularly well suited for this purpose because of the reproducibility between different experiments. DNA microarrays provide one method for the simultaneous measurement of the expression levels of large numbers of genes. Each array consists of a reproducible pattern of capture probes attached to a solid support. Labeled RNA or DNA is hybridized to complementary probes on the array and then detected by laser scanning. Hybridization intensities for each probe on the array are determined and converted to a quantitative value representing relative gene expression levels. See, U.S. Pat. Nos. 6,040,138, 5,800,992 and 6,020,135, 6,033,860, and 6,344,316, which are incorporated herein by reference. High-density oligonucleotide arrays are particularly useful for determining the gene expression profile for a large number of RNA's in a sample.

Techniques for the synthesis of these arrays using mechanical synthesis methods are described in, e.g., U.S. Pat. No. 5,384,261, incorporated herein by reference in its entirety for all purposes. Although a planar array surface is preferred, the array may be fabricated on a surface of virtually any shape or even a multiplicity of surfaces. Arrays may be peptides or nucleic acids on beads, gels, polymeric surfaces, fibers such as fiber optics, glass or any other appropriate substrate, see U.S. Pat. Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992, each of which is hereby incorporated in its entirety for all purposes. Arrays may be packaged in such a manner as to allow for diagnostics or other manipulation of an all-inclusive device. See, for example, U.S. Pat. Nos. 5,856,174 and 5,922,591 herein incorporated by reference.

In one approach, total mRNA isolated from the sample is converted to labeled cRNA and then hybridized to an oligonucleotide array. Each sample is hybridized to a separate array. Relative transcript levels may be calculated by reference to appropriate controls present on the array and in the sample.

Kits for practicing the methods of the invention are further provided. By “kit” is intended any manufacture (e.g., a package or a container) comprising at least one reagent, e.g. an antibody, a nucleic acid probe, etc. for specifically detecting the expression of a biomarker of the invention. The kit may be promoted, distributed, or sold as a unit for performing the methods of the present invention. Additionally, the kits may contain a package insert describing the kit and methods for its use.

In particular embodiments, kits for practicing the immunohistochemistry methods of the invention are provided. Such kits are compatible with both manual and automated immunohistochemistry techniques (e.g., cell staining) as described herein below in Example 1. These kits comprise at least one antibody directed to a biomarker protein of interest. Chemicals for the detection of antibody binding to the biomarker, a counterstain, and a bluing agent to facilitate identification of positive staining cells are optionally provided. Alternatively, the immunochemistry kits of the present invention are used in conjunction with commercial antibody binding detection systems, such as, for example the Dako Envision+system and Biocare Medical's Mach 3 system. Any chemicals that detect antigen-antibody binding may be used in the practice of the invention. In some embodiments, the detection chemicals comprise a labeled polymer conjugated to a secondary antibody. For example, a secondary antibody that is conjugated to an enzyme that catalyzes the deposition of a chromogen at the antigen-antibody binding site may be provided. Such enzymes and techniques for using them in the detection of antibody binding are well known in the art. In one embodiment, the kit comprises a secondary antibody that is conjugated to an HRP-labeled polymer. Chromogens compatible with the conjugated enzyme (e.g., DAB in the case of an HRP-labeled secondary antibody) and solutions, such as hydrogen peroxide, for blocking non-specific staining may be further provided. The kits of the present invention may also comprise a counterstain, such as, for example, hematoxylin. A bluing agent (e.g., ammonium hydroxide) may be further provided in the kit to facilitate detection of positive staining cells.

In another embodiment, the immunohistochemistry kits of the invention comprise at least two reagents, e.g., antibodies, for specifically detecting the expression of at least two distinct biomarkers. Each antibody may be provided in the kit as an individual reagent or, alternatively, as an antibody cocktail comprising all of the antibodies directed to the different biomarkers of interest. Furthermore, any or all of the kit reagents may be provided within containers that protect them from the external environment, such as in sealed containers. Positive and/or negative controls may be included in the kits to validate the activity and correct usage of reagents employed in accordance with the invention. Controls may include samples, such as tissue sections, cells fixed on glass slides, etc., known to be either positive or negative for the presence of the biomarker of interest. The design and use of controls is standard and well within the routine capabilities of those of ordinary skill in the art.

In other embodiments, kits for evaluating the prognosis of a breast cancer patient comprising detecting biomarker overexpression at the nucleic acid level are further provided. Such kits comprise, for example, at least one nucleic acid probe that specifically binds to a biomarker nucleic acid or fragment thereof. In particular embodiments, the kits comprise at least two nucleic acid probes that hybridize with distinct biomarker nucleic acids.

One of skill in the art will appreciate that any or all steps in the methods of the invention could be implemented by personnel or, alternatively, performed in an automated fashion. Thus, the steps of body sample preparation, sample staining, and detection of biomarker expression may be automated. Moreover, in some embodiments, the immunohistochemical methods of the invention are used in conjunction with computerized imaging equipment and software to facilitate the identification of positive-staining cells by a pathologist. The methods disclosed herein can also be combined with other prognostic methods or analyses (e.g., tumor size, lymph node status, expression levels of Her2/neu, Ki67, and p53). In this manner detection of overexpression of the biomarkers of the invention can permit a more accurate determination of the prognosis of a breast cancer patient.

The article “a” and “an” are used herein to refer to one or more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one or more element.

Throughout the specification the word “comprising,” or variations such as “comprises” or “comprising,” will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.

The following examples are offered by way of illustration and not by way of limitation:

EXPERIMENTAL Example 1 Detection of Biomarker Overexpression Using Immunohistochemistry Slide Preparation

4 μM sections of formalin-fixed, paraffin-embedded breast tumor tissue samples are cut using a microtome and placed on SuperFrost+ slides (VWR). The slides are baked in a forced air oven for 20 minutes and then contacted with a Histo-Orienter until the paraffin melts. Slides are washed three times with xylene for 5 minutes to remove paraffin and then rinsed three times in absolute alcohol at 2 minutes/rinse.

Pretreatment and Antigen Retrieval

To prevent non-specific background staining, the slides are incubated in a hydrogen peroxide/methanol block for five minutes at room temperature. Slides are then rinsed thoroughly with several changes of dH₂O.

In order to make the antigens accessible to antibody binding, slides are incubated in an antigen retrieval solution in a pressure cooker for 5 minutes. Slides are allowed to cool to room temperature for 20 minutes on the bench, and the citrate buffer is gradually replaced with dH₂O, tris buffered saline (TBS), or phosphate buffered saline (PBS) by successive dilutions. The slides are then rinsed three times in TBS at 2 minutes per rinse. To break the surface tension, 750 μl/50 ml of 1% BSA/TBS is added to each slide.

Manual Immunohistochemistry

To prevent non-specific background staining, slides are not permitted to dry out during the staining procedure. Slides that have been subjected to antigen retrieval are loaded into a humidity chamber filled with water moistened paper towels. A SLPI antibody (clone 5G6.24; 1:100 dilution) is applied to the slide in a volume sufficient to completely cover the tissue section for 1 hour at room temperature. Following incubation with the primary antibody, the slides are rinsed three times in TBS at 2 minutes per wash. 750 μl/50 ml of 1% BSA/TBS is added to the final wash.

The Dako Envision+ HRP-labeled polymer secondary antibody is applied to the slide for 30 minutes at room temperature, followed by a TBS rinse. The HRP substrate chromogen DAB is applied for 10 minutes, and then the slides are rinsed for 5 minutes with water. Each slide is counterstained with hematoxylin for 5 seconds and then rinsed with water until clear. Following counterstaining, the slides are “blued” by soaking in ammonia water for 10 seconds and then rinsed with water for 1 minute.

Samples are dehydrated by immersing the slides in 95% ethanol for 1 minute and then in absolute ethanol for an additional minute. Slides are cleared by rinsing 3 times in xylene for 1 minute per rinse. Slides are then coverslipped with permanent mounting media and incubated at 35° C. to dry. Biomarker staining is visualized using a bright-field microscope. Scoring is performed by a board certified pathologist in a blind manner.

Automated Immunohistochemistry

The Dako Autostainer Universal Staining system is programmed according to the manufacturer's instructions, and the necessary staining and counterstaining reagents described above for manual immunohistochemistry are loaded onto the machine. The prepared slides are loaded onto the Autostainer, and the program is run. At the end of the run, the slides are removed and rinsed in water for 5 minutes. The slides are dehydrated, cleared, coverslipped, and analyzed as described above.

Example 2 Detection of Overexpression of Individual Biomarkers in Clinical Samples

Approximately 130 breast tumor tissue samples from patients at various disease stages were collected. The average patient age was 77. Actual clinical outcome data for each patient was known, and each patient was categorized as having a good or bad outcome. In this study, good outcome was defined as remaining cancer-free for at least 5 years; bad outcome was defined as suffering disease relapse, recurrence, or death within 5 years. The following table indicates the number of samples within each diagnosis group analyzed, as well as actual clinical outcome data.

TABLE 1 Clinical Samples Analyzed Stage Good Outcome Bad Outcome Total T1N0 50 13 63 T1N1 6 4 10 T2N0 26 19 45 T2N1 9 7 16 T3N0 0 3 3 T3N1 0 1 1 Lymph Node Status Good Outcome Bad Outcome N0 76 35 N1 15 12

The samples were analyzed by the automated immunohistochemistry described in Example 1 to identify biomarkers whose overexpression is indicative of a bad cancer prognosis. That is, the goal of this clinical study was to identify biomarkers that can distinguish good and bad outcome patient samples. Antibodies were used to detect the overexpression of eight biomarkers of interest: SLPI, PSMB9, NDRG-1, E2F1, p21ras, MUC-1, phospho-p27, and src. For quality control purposes, samples were also analyzed for ER, PR, p53, Ki67, and Her2/neu expression.

Commercial antibodies or monoclonal antibodies, identified by polydoma screening as described herein, directed to the biomarkers of interest were diluted as indicated in Table 2 and used to detect biomarker overexpression. The antigen retrieval conditions for each biomarker are also listed below.

TABLE 2 Antibody Dilutions and Antigen Retrieval Conditions Biomarker Antibody (Dilution) Antigen Retrieval Conditions SLPI Clone 5G6.24 (1:100) Citrate buffer, pH 6.0/pressure cooker PSMB9 Clone 3A2.4 (1:500) Citrate buffer, pH 4.0/steamer NDRG-1 Zymed (1:200) Citrate buffer, pH 4.0/steamer E2F1 Calbiochem (1:50) Tris, pH 9.5/pressure cooker p21ras Dako (1:50) Citrate buffer, pH 4.0/steamer MUC-1 Clone 16E3.3 (1:400) Citrate buffer, pH 4.0/steamer phospho-p27 Zymed (1:100) EDTA, pH 8.0/steamer src Upstate (1:50) Citrate buffer, pH 4.0/steamer

Interpretation of Slides

Each slide was reviewed and scored by a board certified pathologist that was unaware of the actual clinical patient outcomes. Samples were scored for biomarker staining intensity on a scale of 0-3. See, for example, Hanausek and Walaszek, eds. (1998) Tumor Marker Protocols (Humana Press, Inc., Totowa, N.J.); and Shi et al, eds. (2000) Antigen Retrieval Techniques: Immunohistochemistry and Molecular Morphology (Eaton Publishing, Natick, Mass.), both of which are herein incorporated by reference in their entirety. For each biomarker, a threshold staining intensity was established. Samples exhibiting a staining intensity of less than this threshold value for a particular biomarker were deemed negative for that biomarker. The staining intensity threshold values for the biomarkers of interest were as follows:

Src: ≧1

MUC-1: ≧3

Phospho-p27: ≧0.5

PSMB9: ≧0.5

NDRG-1: ≧1

E2F1: ≧3

p21ras: ≧0.5

SLPI: ≧2

The staining intensity results were compared with the known actual clinical outcome data available for each patient, and each slide was then given a final result of true positive (TP), true negative (TN), false positive (FP), false negative (FN), according to the parameters described below. Sensitivity and specificity values for each biomarker were calculated.

TABLE 3 Slide Classification for Bad Outcome Biomarkers Biomarker Staining Actual Clinical Outcome* True Positive Positive Bad outcome True Negative Negative Good outcome False Positive Positive Good outcome False Negative Negative Bad outcome *Good clinical outcome = cancer-free survival for at least 5 years Bad clinical outcome = recurrence or death from the underlying cancer within 5 years

Calculations Used Sensitivity=TP/(TP+FN) Specificity=TN/(FP+TN) Positive Predictive Power (PPP)=TP/(TP+FP) Negative Predictive Power (NPP)=TN/(FN+TN) Results

The results for each biomarker are summarized below.

TABLE 4 Summary of Results with Individual Biomarkers Src MUC-1 Phospho-p27 PSMB9 NDRG-1 E2F1 p21ras SLPI TP 8 7 7 13 7 3 10 5 FP 8 4 6 15 14 4 11 7 FN 35 37 44 30 31 34 30 39 TN 59 70 76 54 54 57 60 64 Sensitivity 18.60% 15.91% 13.73% 30.23% 18.42% 8.11% 25.00% 11.36% Specificity 88.06% 94.59% 92.68% 78.26% 79.41% 93.44% 84.51% 90.14%

Example 3 Detection of Biomarker Overexpression in Clinical Samples Combining Biomarkers

In order to determine if the sensitivity and specificity of the methods of the invention could be improved if multiple biomarkers were combined, the data from Example 2 was subjected to further analysis. Thus, various combinations of biomarkers were considered, and samples that stained positive for any of the biomarkers in the combination of interest were deemed positive. These results were compared with the known actual clinical outcome data available for each patient, and each slide was then given a final result of true positive (TP), true negative (TN), false positive (FP), false negative (FN) as before. Sensitivity, specificity, positive predictive value (PPV), and negative predictive values (NPV) for each combination of biomarkers were calculated.

Results

The results for each combination of biomarkers are summarized below.

TABLE 5 SLPI, PSMB9, MUC-1, and phospho-p27 TP 24 FP 25 FN 23 TN 58 Sensitivity 51.06% Specificity 69.88% NPV 71.60% PPV 48.98%

TABLE 6 SLPI, PSMB9, MUC-1, phospho-p27, and src TP 28 FP 28 FN 24 TN 60 Sensitivity 53.85% Specificity 68.18% NPV 71.43% PPV 50.00%

TABLE 7 SLPI, PSMB9, MUC-1, phospho-p27, src, p21ras, E2F1, and NDRG-1 TP 33 FP 41 FN 19 TN 47 Sensitivity 63.46% Specificity 53.41% NPV 71.21% PPV 44.59%

Example 4 Detection of Overexpression of Individual Biomarkers in Clinical Samples Using Marker Analysis Research System (MARS)

Over 200 patients were analyzed in this study. As summarized in Table 8 this population of patients was quite heterogeneous and exhibited tumors of different stages ranging from T1N0 to T3N0.

TABLE 8 Patient Population Analyzed Stage Good Bad All T1N0 60 20 80 T1N1 6 7 13 T2N0 59 39 98 T3N0 6 10 16 Totals 131 76 207

The targeted characteristic of the patients was their good outcome or bad outcome status. In this study, good outcome patients were those still disease-free after five years; bad outcome patients were defined as patients with recurrence, relapse, or death within five years.

Biomarker Selection

The paradigm used for biomarker selection was that biomarker overexpression would capture some of the bad outcome patients and show a very high specificity. Combining different markers would therefore ensure high specificity and gain sensitivity to reach, for example, an 80% sensitivity and 80% specificity. After a multi-step selection process, nine biomarkers were selected for the current study. These markers are shown in Table 9, along with their respective subcellular localization.

TABLE 9 Biomarkers Analyzed Marker Name Localization E2F1 Nucleus MUC-1 (IF3.9) Membrane NDRG-1 (ZYMED CAP43) Cytoplasm (Nucleus + Membrane) p21^(ras) Cytoplasm p53 Nucleus Phospho p27 Cytoplasm (Nucleus) PSMB9 (3A2.4) Cytoplasm SLPI (5G6.24) Cytoplasm src Cytoplasm

Automated Immunohistochemistry

The patient samples were analyzed by automated immunohistochemistry, essentially as described in Example 1, to identify biomarkers whose overexpression is indicative of a bad cancer prognosis. That is, the goal of this clinical study was to identify biomarkers that can distinguish good and bad outcome patient samples. Antibodies were used to detect the overexpression of the nine biomarkers of interest: SLPI, PSMB9, NDRG-1, E2F1, p21ras, p53, MUC-1, phospho-p27, and src. Samples were also analyzed for ER, PR, Ki67, and Her2/neu (CerbB2) expression.

Slides were prepared as described in Example 1 and subjected to antigen retrieval. Specifically, prepared slides were immersed in an antigen retrieval solution and then placed in a pressure cooker (120-125° C. at 17-23 psi) for 5 minutes. The antigen retrieval solutions for each biomarker are listed below in Table 10.

TABLE 10 Antigen Retrieval Solutions Biomarker Antigen Retrieval Solution SLPI Citrate pH 6.0 (Dako #S1699) PSMB9 EDTA (Biocare #CB917L) NDRG-1 Citrate pH 6.0 (Dako #S1699) E2F1 EDTA (Biocare #CB917L) p21ras citrate buffer pH 6.0 (Dako #S1699 MUC-1 Citrate pH 6.0 (Dako #S1699) phospho-p27 deionized water src Tris pH 9.5 (Biocare CB911M)

Slides were gradually returned to room temperature deionized water. The slides were rinsed 3 times in TBS/tween-20 at 2 minutes per wash. 200 pt of a biomarker-specific antibody was added to each slide and incubated at room temperature for one hour. Commercial antibodies or monoclonal antibodies, identified by polydoma screening as described herein, directed to the biomarkers of interest were used to detect biomarker overexpression.

Following incubation with the primary antibody, slides were rinsed twice with TBS/tween-20. 200 μl of labeled polymer (Dako Envision+HRP-labeled polymer secondary antibody) was then added for 30 minutes at room temperature. Slides were again rinsed 3 times with TBS/tween-20 prior to the addition of 200 μl of DAB solution for five minutes at room temperature. The slides were then rinsed three times with TBS/tween-20 and one time with deionized water. 200 μl of hematoxylin was added for 5 minutes. The slides were then rinsed 3 times with deionized water, one time with TBS/tween-20, and 2 additional times with deionized water. The slides were dehydrated, cleared, and coverslipped as described in Example 1.

Pathologist Evaluation

A board certified pathologist manually scored the slides. p53 expression was scored for staining intensity using a scale of 0, 0.5, 1, 2, or 3, percentage of labeled cells, and a clinical diagnostic score. SLPI and PSMB9 were scored for staining intensity using a scale of 0, 0.5, 1, 2, or 3 and percentage of labeled cells. The pathologist also denoted on the slide the tumor area (ROI) used in making the determination. Up to ten individual 20× fields of view from within the selected regions for each tumor, organized in a single focus, were obtained using MARS. The actual number of images obtained from each sample was dependant on the size of the individual tumor. An Excel spreadsheet containing all of the above scoring information along with the patient outcome, lymph node status, and tumor size was generated. The data was subjected to further analysis as described below.

Data Extraction

Using MARS, the following steps were systematically performed for every file:

-   -   Chromogen separation was optimized for each biomarker using the         available slide that showed the best quality stain.     -   Segmentation set up was customized for each biomarker according         to its subcellular localization (nucleus, cytoplasm or         membrane).     -   Features were extracted at cell, field of view (FOV), and focus         level, within the defined ROI and exported to an output file         (XML format).

Data Analysis

A specific program named Multi Marker Analyzer was developed in order to integrate new analysis algorithms and meet the heavy computation needs for this analysis. This software provided a means to load all or a portion of either TMAs or tissue section XML files generated with MARS, to merge data contained in these files using XML files describing the TMA keys (in the case of a TMA analysis) or Excel files giving patient clinical status and patient evaluation (in the case of a tissue section analysis), and all the further analyzes. This merge process included the association of the parameters measured by MARS for each core (or patient) with the information kept in the TMA key (or the Excel file) about the patient: identification number and medical status (good or bad outcome) and the pathologist evaluation if not included in the XML formatted MARS file.

Because some of the samples did not go through the complete experimental process, the number of analyzed patients was smaller than the number of patients reported in Table 8 above and varies from one biomarker to another. The number of tissue sections analyzed for each biomarker is listed below in Table 11. The number of tissue samples analyzed for the conventional breast cancer markers (i.e., ER, PR, Ki67, and Her2/neu (CerbB2)) is in Table 12.

TABLE 11 Number of Tissue Sections Analyzed for Biomarker Overexpression Marker Bad Good Total E2F1 66 106 172 MUC-1 65 108 173 NDRG1 (CAP 43) 75 115 190 p21ras 72 109 181 p53 71 121 192 Phospho-p27 70 115 185 PSMB9 74 118 192 SLPI 75 118 193 src 66 108 174

TABLE 12 Number of Tissue Sections Analyzed for Conventional Breast Cancer Markers Marker Bad Good Total CerbB2 69 122 191 ER 70 123 193 Ki67 69 124 193 PR 69 123 192

Segmentation and Dispatchers Setup

In order to bring MARS analysis closer to the pathologist manner of characterizing slides, only cells considered as being at least 1+ were selected. Table 13 summarizes the segmentation setup used in MARS for this analysis. This segmentation setup lead to the detection of the most stained cells. Segmentation and dispatchers transmittance thresholds were based upon cytologists input. The segmentation setup was pixel-based using 20× images captured with a Dage camera on the computer TPO-RDLAB5.

TABLE 13 Main segmentation setup parameters Size (pixels) Cell 68 Nucleus 32 Hematoxylin Contribution Nucleus  80% Cytoplasm 100% Membrane  0% Hematoxylin Max. Transmittance Nucleus  80% Cytoplasm 100% Membrane 100% DAB Contribution Nucleus  30% Cytoplasm 100% Membrane  0% DAB Max. Transmittance Nucleus  90% Cytoplasm 100% Membrane 100%

In order to assign the selected cells to categories based upon the biomarker staining intensity in the targeted cellular compartment, valid cells resulting from segmentation were dispatched into 3 categories: 1 (in MARS: NegRef), 2 (in MARS: Test) and 3 (in MARS: PosRef). Table 14 provides MARS features and their values used to perform this dispatch, as a function of the cellular localization of the marker.

TABLE 14 Dispatcher Settings Resulting in the Assignment of Selected Cells into Category 1, 2 or 3 Marker Targeted Value Cell Compartment Dispatch Step If MARS Feature Is (Transmittance) Cell(s) Is Nucleus 1 NUCL_DYE2_OD_MEAN > 0.161151 (69%) All (2 or 3) otherwise 1 Cytoplasmic 2 NUCL_DYE2_OD_MEAN >  0.29243 (51%) 2 and 3 3 otherwise 2 1 CYTO_DYE2_OD_MEAN > 0.173925 (67%) All (2 or 3) otherwise 1 Membrane 2 CYTO_DYE2_OD_MEAN >  0.29243 (51%) 2 and 3 3 otherwise 2 CYTO_DYE2_OD_MEAN >  0.06048 (87%) 1 MEMB_DYE2_OD_MEAN > 0.200659 (63%) All (2 or 3) otherwise 1 MEMB_AREA > 150 pix. CYTO_DYE2_OD_MEAN > 0.173925 (67%) 2 MEMB_DYE2_OD_MEAN >  0.29243 (51%) 2 and 3 3 otherwise 2 MEMB_AREA > 150 pix.

An evaluation of category 0 (corresponding to the “expected number of non-stained cells”) was performed. The approximate number of these cells was computed using the average tumor cell area (1100 pixels as estimated from the MARS feature called CELL_AREA) obtained from the area of cells with a staining intensity of 1, 2 and 3 cells:

N₁ = N_(NegRef) N₂ = N_(Test) N₃ = N_(PosRef) $N_{Total} = {\max \left( {N_{1} + N_{2} + {N_{3} \cdot \frac{FOCUS\_ AREA}{1100}}} \right)}$ N₀ = max (0, N_(Total) − N₁ − N₂ − N₃)

Using N₀, N₁, N₂ and N₃, the percentage of cells staining 0, 1, 2 and 3 cells were computed. Table 15 gives the name of these new features.

TABLE 15 Percentage Summary Features Percentage of cells from categories Feature Name 0 CELL_PERCENT_0 1 CELL_PERCENT_1 2 CELL_PERCENT_2 3 CELL_PERCENT_3 0 and 1 CELL_PERCENT_01 2 and 3 CELL_PERCENT_23 0, 1 and 2 CELL_PERCENT_012 1, 2 and 3 CELL_PERCENT_123

These features were computed as a simple percentage, e.g. for CELL_PERCENT_(—)0:

${{CELL\_ PERCENT}\_ 0} = {\frac{N_{0}}{N_{Total}} \times 100}$

This study was run with MARS features, these new summary features, and the pathologist scores. USER_TYPE is the name of the MARS feature for pathologist scoring only.

Multiple Biomarker Analysis

In order to obtain an improved sensitivity/specificity couple, data from multiple biomarkers was combined and analyzed. The specificity target for each biomarker was dependent on the number of biomarkers combined. As an example, a combination of 3 biomarkers will reach 80% specificity if each individual marker specificity is at least of 0.81/3=93%. Table 16 provides the list of required specificity values based on the number of biomarkers in the combination, from 1 to 9.

TABLE 16 Minimum specificity required per biomarker when an overall specificity of 0.8 is targeted for a given combination of up to 9 biomarkers Marker Number Specificity Required in combination Per Marker 1 0.8 2 0.694427 3 0.926318 4 0.945742 5 0.956352 6 0.963492 7 0.968625 8 0.972492 9 0.975511

Data Interpretation

As used herein, the term “marker performance” encompasses the complete experimental performance that relates to the true biological discriminative power of the marker, as well as to the origin and storage of the biological samples, the staining protocols, the scanning process, the imaging and data mining procedures.

Results 1. Per Biomarker A. Pathologist Scoring

The threshold giving the best sensitivity/specificity couple was computed when considering only the pathologist scores (USER_TYPE in MARS). The most significant results are summarized in Table 17 when a specificity of 0.75 was targeted.

TABLE 17 Best Sensitivity and Specificity Couple for Biomarkers (Pathologist Scoring) Marker Threshold Sensitivity Specificity E2F1 1.75 0.30 0.69 MUC-1 0.75 0.21 0.61 NDRG1 2.5 0.28 0.74 p21^(ras) 0.25 0.05 0.98 p53 0.5 0.29 0.74 Phospho-p27 1.25 0.17 0.73 PSMB9 0.75 0.10 0.94 SLPI 2.5 0.18 0.63 src 2.5 0.10 0.67

The threshold giving the best sensitivity/specificity couple was also computed when considering only the pathologist evaluation for conventional markers of the breast panel (i.e., ER, PR, Ki67, and Her2/neu (CerbB2)). The most significant results are summarized in Table 18 when a specificity of 0.75 was targeted.

TABLE 18 Best Sensitivity and Specificity Couple for Conventional Markers (Pathologist Scoring) Marker Threshold Sensitivity Specificity CerbB2 2.5 0.17 0.85 ER 0.5 0.31 0.72 Ki67 0.25 0.14 0.89 PR 2.5 0.23 0.69

For every biomarker and conventional breast cancer marker (i.e., ER, PR, Ki67, and Her2/neu (CerbB2)), the feature and threshold giving the best sensitivity/specificity couple was computed for the pathologist evaluation alone (USER_TYPE). Corresponding receiver operating characteristics (ROC) curves were prepared (data not shown).

B. Single-Feature Analysis

For every biomarker, the feature and threshold giving the best sensitivity/specificity couple was computed when considering every MARS features defined as being meaningful in respect to the analyzed biomarker. Corresponding ROCs were prepared (data not shown). The feature and threshold giving the best result for each biomarker are summarized in Table 19 when a specificity of 0.75 was targeted.

TABLE 19 Best Sensitivity and Specificity Couple for Each Biomarker Obtained from MARS Features (Single Feature Algorithm) Marker Feature Threshold Sens. Spec. Rule E2F1 CELL_PERCENT_01 97.20165 0.575758 0.716981 8 MUC-1 CELL_PERCENT_1 21.4664 0.415385 0.685185 1 NDRG1 CELL_PERCENT_1 16.97263 0.386667 0.713043 8 p21ras CELL_PERCENT_123 61.04522 0.402776 0.724771 1 p53 CELL_PERCENT_3 0.08369 0.422535 0.702479 8 phospho-p27 CELL_PERCENT_1 0.442341 0.528571 0.643478 8 PSMB9 CELL_PERCENT_123 30.42549 0.391892 0.711864 1 SLPI CELL_PERCENT_123 0.610623 0.493333 0.694915 1 src CELL_PERCENT_1 36.80664 0.409091 0.731481 1 *A decision rule of 1 means that patients above the threshold are considered as being positive (i.e. TRUE POSITIVE if bad actual clinical outcome), whereas a decision rule of 8 means that patients above the threshold are considered as being negative (i.e. FALSE NEGATIVE if bad actual clinical outcome).

C. Multiple Feature Analysis

Every percent summary feature was combined two-by-two, and thresholds giving the best sensitivity/specificity couple were computed. The most significant results for each biomarker are provided in Table 20 for a target specificity of 0.75.

TABLE 20 Best Sensitivity and Specificity Couple for Each Biomarker Obtained from MARS features (Multiple Feature Algorithm) Marker Feature 1 Feature 2 Threshold 1 Threshold 2 Sensitivity Specificity Rule E2F1 CELL_PERCENT_2 CELL_PERCENT_3 2.386008 1.275799 0.575758 0.745283 7 MUC-1 CELL_PERCENT_1 CELL_PERCENT_3 21.26747 0.311046 0.507892 0.835135 9 NDRG1 CELL_PERCENT_1 CELL_PERCENT_23 32.96842 0.125389 0.48 0.713043 9 p21ras CELL_PERCENT_3 CELL_PERCENT_01 0.1695 99.97016 0.458333 0.715596 6 p53 CELL_PERCENT_1 CELL_PERCENT_123 1.667596 17.22644 0.492958 0.710744 2 phospho-p27 CELL_PERCENT_1 CELL_PERCENT_01 0.456608 100 0.5 0.695652 8 PSMB9 CELL_PERCENT_1 CELL_PERCENT_123 47.63697 20.25946 0.466486 0.720339 4 SLPI CELL_PERCENT_0 CELL_PERCENT_1 62.11591 0.414484 0.573333 0.728814 1 src CELL_PERCENT_2 CELL_PERCENT_3 16.31145 0.082021 0.545455 0.712963 4 *Decision rules correspond to quadrant affection in the 2 features space.

2. Combinations of Biomarkers

The complete set of possible combinations of 1 to 9 markers was investigated using successively: the pathologist scoring, one MARS feature, and two MARS features per marker. The sensitivity and specificity were computed according to an FDA-like and a sequence-based interpretation method. “FDA-like” means that any marker ON (1) leads to a bad outcome decision. That is, a combination of markers is considered positive if at least one marker is positive. The sequence-based interpretation relies on sensitivity/specificity of each specific ON/OFF combination. The results obtained with pathologist scoring (Table 21) and percentage features evaluation (Table 22) are presented below.

TABLE 21 Best Sensitivity/Specificity Couples for Biomarker Combinations Using Different Targeted Specificities (75% and 95%) and Different Interpretation Algorithms (Pathologist Scoring) Target 9 Spec Markers 1 Marker 2 Markers 3 Markers 4 Markers 5 Markers 6 Markers 7 Markers 8 Markers Markers FDA 0.75 spec 0.74 0.52 sens 0.29 0.58 SEQUENCE spec 0.84 0.84 0.80 0.79 0.80 0.82 0.82 0.77 sens 0.24 0.26 0.31 0.34 0.34 0.31 0.32 0.30 FDA 0.95 spec 0.90 0.86 0.88 0.84 sens 0.13 0.23 0.25 0.30 SEQUENCE spec 0.86 0.85 0.85 0.86 0.85 sens 0.30 0.31 0.31 0.30 0.30 *Each patient is characterized by the pathologist score.

TABLE 22 Best Sensitivity/Specificity Couples for Biomarker Combinations Using Different Targeted Specificities (75% and 95%) and Different Interpretation Algorithms (Percentage Features) All Povs Target 1 2 3 4 5 6 7 8 9 % Features Spec Markers Marker Markers Markers Markers Markers Markers Markers Markers Markers 1 FDA 0.75 spec 0.71 0.53 sens 0.57 0.80 SEQUENCE spec 0.96 0.88 0.80 0.81 0.81 0.81 0.80 0.80 sens 0.33 0.47 0.60 0.58 0.55 0.62 0.58 0.59 FDA 0.95 spec 0.93 0.87 0.83 0.81 sens 0.18 0.34 0.44 0.50 SEQUENCE spec 0.82 0.81 0.81 0.81 0.82 sens 0.48 0.49 0.49 0.49 0.46 2 FDA 0.75 spec 0.74 0.55 sens 0.57 0.80 SEQUENCE spec 0.97 0.85 0.82 0.86 0.84 0.84 0.86 0.83 sens 0.39 0.61 0.66 0.65 0.69 0.71 0.73 0.71 FDA 0.95 spec 0.94 0.90 0.83 0.82 0.81 sens 0.30 0.50 0.63 0.72 0.76 SEQUENCE spec 0.83 0.81 0.80 0.81 0.80 sens 0.73 0.70 0.70 0.69 0.69 *Each patient is characterized by the percentage of 1, 2 and 3 staining cells.

Specific examples for combinations of four and six biomarkers are provided in Examples 5.

Analysis without Data from Infiltrating Lobular Cancer (ILC) Patients

The patient population described in Table 8 was further subdivided based on diagnosis. Specifically, data from patients with infiltrating lobular carcinoma (ILC) was excluded, and the above analysis was performed on the resulting data set. Details of the patient population analyzed in this study are provided in Table 23.

TABLE 23 Patient Population Analyzed (Without ILC Patients) Stage Good Bad All T1N0 56 19 75 T1N1 6 7 13 T2N0 54 33 87 T3N0 6 7 13 Totals 122 66 188

Results 1. Per Biomarker A. Pathologist Scoring

TABLE 24 Best Sensitivity and Specificity Couple for Biomarkers without ILC Patient Data (Pathologist Scoring) Marker Threshold Sensitivity Specificity E2F1 1.75 0.29 0.69 MUC-1 0.75 0.26 0.80 NDRG-1 2.5 0.26 0.72 p21^(ras) 0.25 0.03 0.98 p53 0.5 0.29 0.75 Phospho-p27 1.25 0.16 0.71 PSMB9 0.75 0.12 0.93 SLPI 2.5 0.22 0.81 src 2.5 0.10 0.86

TABLE 25 Best Sensitivity and Specificity Couple for Conventional Markers without ILC Patient Data (Pathologist Scoring) Marker Threshold Sensitivity Specificity CerbB2 2.5 0.16 0.85 ER 0.5 0.38 0.71 Ki67 0.25 0.14 0.88 PR 2.5 0.23 0.69

B. Single-Feature Analysis

TABLE 26 Best Sensitivity and Specificity Couple for Each Biomarker Obtained from MARS Features without ILC Patient Data (Single Feature Algorithm) Marker Feature Threshold Sens. Spec. Rule E2F1 CELL_PERCENT_23 3.19079 0.58182 0.73469 1 MUC-1 CELL_PERCENT_23 8.437 0.38462 0.71717 8 NDRG1 CELL_PERCENT_123 26.13234 0.39683 0.69811 8 p21ras CELL_PERCENT_123 61.04522 0.45763 0.72277 1 p53 CELL_PERCENT_3 0.08289 0.41379 0.71171 8 phospho-p27 CELL_PERCENT_123 0.44587 0.49153 0.64486 8 PSMB9 CELL_PERCENT_123 30.42545 0.40323 0.71560 1 SLPI CELL_PERCENT_123 0.57594 0.53226 0.70370 1 src CELL_PERCENT_23 13.08501 0.43636 0.66000 8 *A decision rule of 1 means that patients above the threshold are considered as being positive (i.e., TRUE POSITIVE if bad actual clinical outcome) whereas a decision rule of 8 means that patients above the threshold are considered as being negative (i.e., FALSE NEGATIVE if bad actual clinical outcome).

C. Multiple Feature Analysis

TABLE 27 Best Sensitivity and Specificity Couple for Each Biomarker Obtained from MARS Features without ILC Patient Data (Multiple Feature Algorithm) Marker Feature 1 Feature 2 Threshold 1 Threshold 2 Sensitivity Specificity Rule E2F1 CELL_PERCENT_2 CELL_PERCENT_3 2.47761 1.2758 0.61818 0.7449 7 MUC-1 CELL_PERCENT_1 CELL_PERCENT_2 9.6658 13.2244 0.51923 0.68687 9 NDRG1 CELL_PERCENT_0 CELL_PERCENT_123 28.32391 16.95268 0.49206 0.70755 6 p21ras CELL_PERCENT_3 CELL_PERCENT_01 0.1695 99.97219 0.49153 0.72277 6 p53 CELL_PERCENT_0 CELL_PERCENT_3 48.61018 0.07805 0.46552 0.71171 6 phospho-p27 CELL_PERCENT_1 CELL_PERCENT_01 0.50369 100 0.49153 0.69159 8 PSMB9 CELL_PERCENT_123 CELL_PERCENT_01 30.42545 99.09092 0.45161 0.7156 11 SLPI CELL_PERCENT_0 CELL_PERCENT_123 62.11591 0.40094 0.58065 0.75 1 src CELL PERCENT 2 CELL PERCENT 3 16.31145 0.08202 0.52727 0.72 4 *Decision rules correspond to quadrant affection in the 2 features space. D. Variations Between Analyses: All Patients v. Without ILC Patients

Variations in the sensitivity and specificity values obtained on a per biomarker basis with the analysis of the complete patient population (Table 8) and the population without ILC patients (Table 23) was determined. The results are presented below in Table 28. The sum column (d²) gives the difference of quadratic distance on an ROC curve, i.e., the overall gain in sensitivity and specificity.

TABLE 28 Variations in Sensitivity and Specificity Obtained with the Complete Patient Population and Without ILC Patients (Per Biomarker) Pathologist Scoring Single-Feature Multi-Features Marker Sens. Spec. d¹ Sens. Spec. d² Sens. Spec. d² E2F1 ↓ — −0.004 ↑ ↑ 0.018 ↑ ↓ 0.026 MUC-1 ↑ ↓ 0.004 ↓ ↑ 0.013 ↑ ↑ 0.006 NDRG1 ↓ ↓ −0.026 ↑ ↓ −0.006 ↑ ↓ 0.002 p21ras ↓ — −0.001 ↑ ↓ 0.026 ↑ ↑ 0.024 p53 — ↑ 0.009 ↓ ↑ 0.003 ↓ ↑ −0.015 phospho-p27 ↓ ↓ −0.022 ↓ ↑ −0.022 ↓ ↓ −0.008 PSMB9 ↑ ↓ −0.008 ↑ ↑ 0.000 ↓ ↓ −0.023 SLPI ↑ ↓ −0.010 ↑ ↑ 0.030 ↑ ↑ 0.021 src — ↓ −0.010 ↓ ↓ −0.047 ↓ ↑ −0.005

2. Combinations of Biomarkers A. Pathologist Scoring

TABLE 29 Best Sensitivity/Specificity Couples for Biomarker Combinations without ILC Patient Data Using Different Targeted Specificities (75% and 95%) and Different Interpretation Algorithms (Pathologist Scoring) Target 9 Spec. Markers 1 Marker 2 Markers 3 Markers 4 Markers 5 Markers 6 Markers 7 Markers 8 Markers Markers FDA 0.75 spec 0.75 0.63 SEQUENCE sens 0.20 0.67 spec 0.88 0.83 0.84 0.84 0.90 0.85 0.83 0.77 sens 0.24 0.35 0.37 0.35 0.36 0.35 0.37 0.27 *Each patient is characterized by the pathologist score.

B. Percentage Features Analysis

TABLE 30 Best Sensitivity/Specificity Couples for Biomarker Combinations without ILC Patients Using Different Targeted Specificities (75% and 95%) and Different Interpretation Algorithms (Percentage Features) All Povs Target 1 2 3 4 5 6 7 8 9 % Features Spec Markers Marker Markers Markers Markers Markers Markers Markers Markers Markers 1 FDA 0.79 spec 0.73 0.54 SEQUENCE sens 0.58 0.22 spec 0.95 0.86 0.81 0.68 0.85 0.82 0.92 0.78 sens 0.35 0.46 0.56 0.58 0.38 0.82 0.57 0.50 2 FDA 0.35 spec 0.74 0.55 SEQUENCE sens 0.80 0.63 spec 0.36 0.67 0.64 0.71 0.70 0.72 0.85 0.70 sens 0.35 0.67 0.94 0.75 0.70 0.72 0.56 0.70 *Each patient is characterized by the percentage of 1, 2 and 3 staining cells.

Table 30 shows an increase in specificity (0.88 compared to 0.81, see Table 28) when considering a 5 biomarker combination excluding ILC patients with a single percent feature. An increase in sensitivity was observed when using 2 features (0.71 vs. 0.65, see Table 28) for a 5 biomarker sequence analysis when excluding ILC patients from the study.

C. Variations Between Analyses: All Patients v. without ILC Patients

Variations in the sensitivity and specificity values obtained for biomarker combinations with the analysis of the complete patient population and the population without ILC patients was determined. The results are presented below in Table 31. The sum column (d²) gives the difference of quadratic distance on a ROC curve, i.e., the overall gain in sensitivity and specificity. A slight gain in performance for a 5 biomarker sequence analysis using one or two percentage features was observed when ILC patients were excluded from the study.

TABLE 31 Variations in Sensitivity and Specificity Obtained with Complete Patient Population and Without ILC Patients (Biomarker Combinations) All Povs Target 1 2 3 4 5 6 7 8 9 % Features Spec Markers Marker Markers Markers Markers Markers Markers Markers Markers Markers 1 FDA 0.75 d2 0.02 0.02 SEQUENCE d2 0.00 0.01 0.00 0.06 0.04 0.01 0.01 −0.07 2 FDA 0.75 d2 0.02 0.02 SEQUENCE d2 −0.01 0.00 0.02 0.02 −0.01 −0.01 −0.06 −0.04

Example 5 Specific Biomarker Combinations

The data obtained in the study described above in example 4 were further analyzed, and specific biomarker combinations were considered. The results obtained with a combination of four (SLPI/p21ras/E2F1/src) and six (SLPI/p21ras/PSMB9/E2F1/src/phospho-p27) biomarkers are presented below.

Four Biomarker Combination: SLPI/p21ras/E2F1/src

Analysis was performed using only one percentage feature for SLPI, p21ras, E2F1, and src with the thresholds and decision rule defined in Table 32. A 60% sensitivity and an 80% specificity was obtained using the rule: if E2F1 was ON (i.e. 1) and not the only biomarker to be ON, then the patient was considered bad outcome; otherwise, considered good outcome. FIG. 1 shows the distribution of the percentage feature as a function of bad and good outcome patients for E2F1. Using a threshold of 2.46% sensitivity and specificity values of 0.54 and 0.75, respectively, were obtained.

TABLE 32 Percentage Summary Features for Four Biomarker Analysis Marker Feature Threshold Rule (1 if) SLPI CELL_PERCENT_01 99.887874 < p21ras CELL_PERCENT_0 35.642851 < E2F1 CELL_PERCENT_2 2.463659 > src CELL_PERCENT_1 37.624326 >

A sequence-based interpretation approach was used to analyze the four biomarker combination. The sequence-based decision rule used was: if E2F1 was ON (i.e. 1) and not the only biomarker to be ON, then the patient was considered bad outcome; otherwise, considered good outcome. The sensitivity and specificity values for all of the possible combinations of the four biomarkers are provided in Table 33. The ROC curve obtained using the sequence interpretation approach for the SLPI/p21ras/E2F1/src combination was prepared (data not shown).

TABLE 33 Sensitivity and Specificity Couples Using Sequence-based Interpretation Approach for SLPI, p21ras, E2F1 and SRC Combination SLPI-p21ras-E2F1-src Sequence CumulBad CumulGood Sensitivity Specificity S1111 4 0 0.069 1 S1011 7 0 0.1207 1 S1110 12 0 0.2069 1 S0111 14 8 0.2414 0.9184 S1010 22 12 0.3793 0.8776 S1101 26 14 0.4483 0.8571 S0011 31 16 0.5345 0.8367 S0110 35 19 0.6034 0.8061 S1001 37 24 0.6379 0.7551 S1100 37 26 0.6379 0.7347 S0010 39 37 0.6724 0.6224 S0101 41 40 0.7069 0.5918 S1000 46 56 0.7931 0.4286 S0001 49 63 0.8448 0.3571 S0100 52 71 0.8966 0.2755 S0000 58 98 1 0 *A sequence S0110 is read as follows: SLPI = OFF/p21ras = ON/E2F1 = ON/src = OFF.

An interpretation based on E2F1 alone gave a sensitivity and specificity of 54% and 75%, respectively. A specificity and sensitivity of 60% and 80%, respectively, was obtained using the sequence-based algorithm defined above (i.e., if E2F1 was ON (i.e. 1) and not the only biomarker to be ON, then the patient was considered bad outcome; otherwise, considered good outcome).

Six Biomarker Combination: SLPI/p21ras/E2F1/src/PSMB9/phospho-p27

Analysis was performed using only one percentage feature for a six biomarker combination of SLPI, p21ras, E2F1, src, PSMB9, and phospho-p27 with the thresholds and decision rules defined in Table 34.

TABLE 34 Percentage Summary Features for Six Biomarker Analysis MarkerName Feature Threshold Sensitivity Specificity Rule (1 if) SLPI CELL_PERCENT_123 0.576 53.2% 70.4% > p21ras CELL_PERCENT_123 61.045 45.8% 72.3% > E2F1 CELL_PERCENT_23 3.191 58.2% 73.5% > PSMB9 CELL_PERCENT_123 30.425 40.3% 71.6% > src CELL_PERCENT_23 13.085 43.6% 66.0% < phospho-p27 CELL_PERCENT_123 0.446 49.2% 64.5% <

A sequence-based interpretation approach was used to analyze the six biomarker combination. The sequence-based decision rule used was: If E2F1 was ON (i.e. 1) and either SLPI or 21ras, or E2F1 and any 2 biomarkers, or SLPI and any 2 biomarkers, or any 4 biomarkers or more were ON, then the patient was considered bad outcome; otherwise considered good outcome. The sensitivity and specificity values for all of the possible combinations of the six biomarkers of interest are provided in Table 35. The ROC curve obtained using the sequence interpretation approach for the SLPI/p21ras/E2F1/PSMB9/src/phospho-p27 combination are shown in FIG. 2.

TABLE 35 Sensitivity and Specificity Couples Using Sequence-based Interpretation Approach for SLPI, p21ras, E2F1, PSMB9, SRC, and Phospho-p27 Combination SLPI-p21ras-E2F1-PSMB9-src-phospho-p27 Sequence CumulBad CumulGood Sensitivity Specificity S111111 1 0 0.0208 1 S111101 2 0 0.0417 1 S111011 2 0 0.0417 1 S111110 2 0 0.0417 1 S101111 3 0 0.0625 1 S111001 4 0 0.0833 1 S111100 8 0 0.1667 1 S011111 8 0 0.1667 1 S111010 9 0 0.1875 1 S101101 11 0 0.2292 1 S101011 12 0 0.25 1 S110111 12 0 0.25 1 S101110 12 0 0.25 1 S011101 12 0 0.25 1 S111000 12 0 0.25 1 S011011 13 1 0.2708 0.9885 S011110 13 1 0.2708 0.9885 S101001 13 2 0.2708 0.977 S110101 14 2 0.2917 0.977 S101100 15 3 0.3125 0.9655 S001111 17 3 0.3542 0.9655 S110011 19 4 0.3958 0.954 S101010 21 4 0.4375 0.954 S110110 21 4 0.4375 0.954 S011001 21 5 0.4375 0.9425 S011100 22 8 0.4583 0.908 S011010 23 10 0.4792 0.8851 S100111 23 10 0.4792 0.8851 S001101 23 11 0.4792 0.8736 S110001 23 11 0.4792 0.8736 S101000 25 13 0.5208 0.8506 S001011 26 14 0.5417 0.8391 S110100 27 14 0.5625 0.8391 S010111 27 15 0.5625 0.8276 S001110 28 16 0.5833 0.8161 S110010 28 16 0.5833 0.8161 S011000 31 19 0.6458 0.7816 S100101 31 19 0.6458 0.7816 S100011 33 20 0.6875 0.7701 S100110 34 20 0.7083 0.7701 S001001 34 21 0.7083 0.7586 S010101 35 22 0.7292 0.7471 S001100 35 26 0.7292 0.7011 S110000 36 28 0.75 0.6782 S010011 37 29 0.7708 0.6667 S001010 37 30 0.7708 0.6552 S010110 37 30 0.7708 0.6552 S100001 37 34 0.7708 0.6092 S100100 38 38 0.7917 0.5632 S000111 40 39 0.8333 0.5517 S100010 40 45 0.8333 0.4828 S010001 41 46 0.8542 0.4713 S001000 41 46 0.8542 0.4713 S010100 41 47 0.8542 0.4598 S010010 41 48 0.8542 0.4483 S000101 41 51 0.8542 0.4138 S100000 42 54 0.875 0.3793 S000011 42 59 0.875 0.3218 S000110 42 61 0.875 0.2989 S010000 42 65 0.875 0.2529 S000001 43 70 0.8958 0.1954 S000100 43 72 0.8958 0.1724 S000010 44 77 0.9167 0.1149 S000000 48 87 1 0

A specificity and sensitivity of 70% and 77%, respectively, was obtained using the sequence-based algorithm defined above.

Example 6 Optimization of Reagents and Staining Conditions for Immunohistochemistry

In order to maximize the signal to noise ratio for detection of expression of a particular biomarker using the immunohistochemistry methods disclosed herein, experiments to select the optimal antigen retrieval solution and conditions, antibody concentration and diluent formulation, and detection chemistry parameters were performed. For each set of experiments, biomarker-specific tissue microarrays (TMAs) were constructed by obtaining cylindrical tissue specimens from regular paraffin blocks, assembling them into a single block, and preparing sections containing multiple tissue specimens. TMAs with 2-3 pre-selected known positive and negative tumors for each breast biomarker were used. Slides were prepared and automated immunohistochemistry was performed essentially as described in Example 1. The following control reagents were used during all of the optimization experiments:

-   -   For the negative control, the application of the primary         antibody was replaced with a ready to use universal negative         reagent, either non-specific mouse or rabbit IgG.     -   EF1-α was used as a positive control.     -   A positive marker control slide was run following the optimized         labeling parameters established during feasibility for each         antibody being tested.     -   A biomarker specific TMA containing both positive and negative         tumors was used in the testing of each breast marker antibody.

1. Optimization of Antigen Retrieval A. Antigen Retrieval Solutions

Each antigen retrieval solution listed below was tested using each of the biomarker antibodies of interest. The time and temperatures used here were standard accepted values as defined below.

TABLE 36 Antigen Retrieval Solutions Tested Solution Time Temperature Device Citrate Buffer  5 minutes 120° C. Pressure Cooker pH 6.0 (Dako) Tris Buffer pH 9.5  5 minutes 120° C. Pressure Cooker (Biocare) EDTA pH 8.0 20 minutes  95° C. Steamer (Biocare) L.A.B. 20 minutes 20° C. and 60° C. None/oven (Polysciences) Antigen Retrieval  5 minutes 120° C. Pressure Cooker Glyca Solution (Biogenex) Citrate Buffer 20 minutes  95° C. Steamer Solution, pH 4.0 (Zymed) diH₂0 20 minutes 120° C. Pressure Cooker Dawn (Protor &  3 minutes 120° C. Pressure Cooker Gamble) 2% Glacial Acetic 10 minutes  95° C. Steamer Acid

The slides were scored by a pathologist, and the best performing antigen retrieval solution were determined by comparing the labeling specificity and intensity between positive and negative tumors. If the results were essentially negative, alternative antigen retrieval solutions were screened. If results were positive, i.e. labeling more intense than no antigen retrieval, the top (1-3) solutions were identified and used for antigen retrieval time and temperature testing. The activity of the selected antigen retrieval solutions was verified by labeling a representative sample of positive and negative whole tissue sections.

B. Antigen Retrieval Conditions—Time and Temperature

The best-performing antigen retrieval solutions were tested using the following time and temperature criteria:

TABLE 37 Antigen Retrieval Time and Temperature Conditions Tested 3 5 10 20 30 4 Over- Temp minutes minutes minutes minutes minutes hours night 2-8° C. * * 25° C. * * 37° C. * * 60° C. * * 95° C./ * * * ST 120° C./ * * * PC

The slides were scored by a pathologist, and the best-performing antigen retrieval time and temperature combinations were determined by comparing the labeling specificity and intensity between positive and negative tumors. The activity of the selected antigen retrieval solutions and time and temperature combinations was verified by labeling a representative sample of positive and negative whole tissue sections utilizing the controls listed above.

2. Optimization of Antibody Dilution and Diluent Formulations A. Antibody Dilution

Each breast cancer biomarker antibody was tested over a range of antibody dilutions. Table 38 provides an example of antibody dilutions tested for the SLPI 5G6.24 antibody. All other breast biomarker antibodies were tested in a similar manner.

TABLE 38 Antibody Dilutions Tested IgG μg/slide (200 μl/ Antibody concentration slide) Dilution SLPI   3.5 mg/ml 3.5 1:200 5G6.24 (3.5 μg/ul) 1.75 1:400 1.17 1:600 0.88 1:800 0.7 1:1000 0.47 1:1500

The slides were scored by a pathologist, and the labeling intensities between controls, known positive, and known negative tumors were assessed. The labeling data was analyzed to determine both the upper and lower limits of the antibody dilutions that maintained the desired labeling intensity and the width of the utility range for each antibody. If the initial dilution range tested did not result in the identification of the upper and lower limits, additional antibody dilutions were tested.

B. Antibody Diluent Formulation

Various antibody diluents were tested using each of the breast biomarker antibodies of interest. The table below provides a description of the diluent parameters that were tested.

TABLE 39 Antibody Diluents Tested PBS pH 7.4 PBS pH 7.4 0.1% tween 20 PBS pH 7.4 1% BSA PBS pH 7.4 0.05% NaN₃ PBS pH 7.4 0.1% tween 20 1% BSA PBS pH 7.4 0.1% tween 20 0.05% NaN₃ PBS pH 7.4 1% BSA 0.05% NaN₃ PBS pH 7.4 0.1% tween 20 1% BSA 0.05% NaN₃

The slides were scored by a pathologist for labeling intensity. The effectiveness of the diluent formulation was determined by comparing the labeling grade of the biomarker control slide to the experimental slides. Those that resulted in the most specific and highest signal to noise ratio by comparing the labeling of positive and negative tumors were carried forward. The diluent formulations (approximately one to three) that resulted in the optimal labeling intensity were carried forward into further optimization and stability studies. The activity of the selected diluents was verified by labeling a representative sample of positive and negative whole breast cancer tissue sections.

3. Optimization of Detection Chemistry

Each of the breast biomarker antibodies was tested utilizing the DAKO Envision+ detection kit over the range of times and concentrations listed below.

TABLE 40 Detection Chemistry Time and Concentration Conditions Tested Time Concentration 10 minutes 30 minutes 60 minutes 1.0X Concentration 0.75X Concentration  0.5X Concentration

The slides were scored by a pathologist, and the labeling intensities between controls, known positive, and known negative tumors were assessed. The activity of the selected detection chemistry time and concentration combinations was verified by labeling a representative sample of positive and negative whole breast cancer tissue sections.

Results

A significantly improved signal to noise ratio was observed with optimized staining reagent conditions (data not shown).

Example 7 Real-Time PCR Detection of Biomarkers in Clinical Samples

TaqMan® real-time PCR was performed with the ABI Prism 7700 Sequence Detection System (Applied Biosystems, Foster City, Calif.). The primers and probes were designed with the aid of the Primer Express™ program, version 1.5 (Applied Biosystems, Foster City, Calif.), for specific amplification of the targeted breast staging markers (e.g., DARPP32 and NDRG-1) in this study. The sequence information on primers and probes is shown below:

DARPP32:

(SEQ ID NO: 33) Forward Primer Name: DARPP32_t1-F Sequence: TACACACCACCTTCGCTGAAAG (SEQ ID NO: 34) Reverse Primer Name: DARPP32_t1-R Sequence: GGCCTGGTTCTCATTCAAATTG (SEQ ID NO: 35) TaqMan Probe Name: DARPP32_t1-Probe Sequence: CGCATTGCTGAGTCTCACCTGCAGTC (SEQ ID NO: 36) Forward Primer Name: DARPP32_t2-F Sequence: CAGCCTTACAGAGACTGGAAAAGAA (SEQ ID NO: 37) Reverse Primer Name: DARPP32_t2-R Sequence: GAGGCTCAGGGACCCAAAG (SEQ ID NO: 38) TaqMan Probe Name: DARPP32_t2-Probe Sequence: CCAAACCAAGGCCCCCAGAGAGGT

NDRG-1:

Forward Primer Name: NDRG-1-F Sequence: CCTACCGCCAGCACATTGT (SEQ ID NO: 39) Reverse Primer Name: NDRG-1-R Sequence: GCTGTTGTAGGCATTGATGAACA (SEQ ID NO: 40) TaqMan Probe Name: NDRG-1-Probe Sequence: AATGACATGAACCCCGGCAACCTG (SEQ ID NO: 41)

The probes were labeled with a fluorescent dye FAM (6-carboxyfluorescein) on the 5′ base, and a quenching dye TAMRA (6-carboxytetramethylrhodamine) on the 3′ base. The sizes of the amplicons were around 100 bp. 18S ribosomal RNA was applied as endogenous control. 18S rRNA probe was labeled with a fluorescent dye VIC. Pre-Developed 18S rRNA primer/probe mixture was purchased from Applied Biosystems (P/N: 4310893E). 20 frozen breast tissues (i.e., 6 tumors with bad outcome, 12 tumors with good outcome, and 2 normal tissues) were analyzed in this study. In this study, good outcome was defined as remaining cancer-free for at least 5 years; bad outcome was defined as suffering disease relapse, recurrence, or death within 5 years. 5 μg of total RNA extracted from the frozen breast tissues was quantitatively converted into the single stranded cDNA form with random hexamers (not with oligo-dT) by using the High-Capacity cDNA Archive Kit (Applied Biosystems, P/N: 4322171). The following reaction reagents were prepared:

20X Master Mix of Primers/Probe (in 200 μl) 180 μM Forward primer   20 μl 180 μM Reverse primer   20 μl 100 μM TaqMan probe   10 μl H₂O  150 μl Final Reaction Mix (25 μl/well) 20X master mix of primers/probe 1.25 μl 2X TaqMan Universal PCR master mix (P/N: 4304437) 12.5 μl cDNA template  5.0 μl H₂O 6.25 μl

20× TaqMan Universal PCR Master Mix was purchased from Applied Biosystems (P/N: 430-4437). The final primer and probe concentrations, in a total volume of 25 μl, were 0.9 μM and 0.25 μM, respectively. 10 ng of total RNA was applied to each well of the reaction. The amplification conditions were 2 min at 50° C., 10 min at 95° C., and a two-step cycle of 95° C. for 15 seconds and 60° C. for 60 seconds for a total of 40 cycles. At least three no-template control reaction mixtures were included in each run. All experiments were performed in triplicate.

At the end of each reaction, the recorded fluorescence intensity was used for the following calculations: Rn⁺ is the Rn value of a reaction containing all components, Rn⁻ is the Rn value of an unreacted sample (baseline value or the value detected in NTC). ΔRn is the difference between Rn⁺ and Rn⁻. It is an indicator of the magnitude of the signal generated by the PCR. Expression level of a target gene was computed by comparative CT method. This method uses no known amount of standard but compares the relative amount of the target sequence to the reference values chosen (18S rRNA was selected as a reference in this study). See the Applied Biosystems' TaqMan Human Endogenous Control Plate Protocol that contains detailed instructions regarding MS Excel based data analysis.

Results

The results obtained with each biomarker and with the specific primers are listed below in tabular form. Results obtained with normal breast tissue samples are designated N; those obtained with breast cancer samples are labeled T.

TABLE 41 DARPP32 TaqMan ® Results Samples t1 t2 t1t2  2T 0.18 0.5 0.54  7T 5.7 23.5 62.5 12T 73.5 16.9 84.2 13T 1.2 1.1 2.2 21T 5.8 6.1 16.1 24T 4.2 2.9 7.9 26T 0.6 0.3 1.9  1T 0.02 0.2 0.1  3T 0.4 0.04 0.8  4T 2.5 1 4.8  5T 1.2 0.5 3.7  6T 0.9 0.6 2.6  9T 0.3 0.6 0.5 10T 0.1 0.2 0.3 11T 0.7 0.1 0.9 19T 0.8 0.3 1.6 22T 0.6 0.6 1.6 23T 0.5 0.4 1.2 25T 0.2 0.1 0.3  1N 1.1 1.3 2  8N 0.7 0.3 1.3 Bad 15.10 8.50 28.91 Mean: Good 0.69 0.39 1.53 Mean t-test P = 0.046 0.004 0.007

DARPP32 has two transcripts: t1 and t2. TaqMan® data showed that both t1 and t2 were overexpressed in the breast tumors with bad outcomes (in bold) as compared with those with good outcomes.

TABLE 42 NDRG-1 TaqMan ® Results Samples NDRG-1  2T 2.8  7T 12.8 12T 5.5 13T 6.4 21T 2.4 24T 6.7 26T 2.3  1T 4.1  3T 4.2  4T 2.8  5T 3.2  6T 1.3  9T 3.1 10T 3.7 11T 1.6 19T 3.4 22T 5.5 23T 1.6 25T 3.1  1N 0.9  8N 0.5 Bad 6.10 Mean: Good 3.13 Mean: t-test P = 0.021

NDRG-1 has one transcript. TaqMan data showed that NDRG-1 was overexpressed in the breast tumors with bad outcomes (in bold) as compared with those with good outcomes.

Example 8 Detection of Biomarker Overexpression in a Chemo-Naïve Patient Population with 10-Year Clinical Follow-Up (Five Biomarker Panel)

Breast tumor tissue samples collected at or near the time of initial diagnosis from 255 early-stage breast cancer patients were analyzed for biomarker overexpression in this study. Ten-year clinical follow-up data was available for all patients in the study. None of the patients received cytotoxic chemotherapy at any time during their treatment for breast cancer. The clinical demographics, distribution, and standard histopathological parameters (e.g., ER/PR hormone receptor status, histological grade, etc.) for the patient population are summarized below in Table 43.

TABLE 43 Clinical Characteristics of Chemo-Naïve Patient Population Characteristics Overall Age at diagnosis (years) n = 255 Mean (std) 64.0 (10.6) Range 30-85 Age group distribution <40 6 (2.4%) 40-<50 23 (9.0%) 50-<60 48 (18.8%) 60-<70 87 (34.1%) >=70 91 (35.7%) Tumor size (cm) n = 255 Mean (std) 2.1 (1.19) Range  0.3-11.0 Tumor size group <1.0 16 (6.3%) 1.0-<2.0 104 (40.8%) 2.0-<4.0 122 (47.8%) >=4.0 13 (5.1%) Lymph node status n = 255 Negative 232 (91.0%) Positive 23 (9.0%) Histological Grade n = 244 1 38 (15.6%) 2 135 (55.3% 3 71 (29.1%) ER Status n = 249 Negative 64 (25.7%) Positive 185 (74.3%) Her2/neu status n = 249 Negative 176 (70.7%) Positive 73 (29.3%)

Detection of expression of a five biomarker panel comprising SLPI, src, PSMB9, p21ras, and E2F1 was performed essentially as described above. That is, breast tumor samples were prepared and stained for biomarker expression using the Dako Autostainer, as described above in Example 1. Biomarker overexpression was determined using the imaging analysis described in Example 4.

The prognostic performance of the 5 biomarker panel was assessed utilizing a Cox Proportional Hazards Model analysis. See, for example Spruance et al., supra. The prognostic value of each biomarker and/or histological characteristic to identify the patients who suffered disease recurrence or death within ten years over the patients disease-free after ten years was calculated. In the analysis without the biomarker panel, age and tumor size were found to be independent prognostic factors with a p value<0.05. When the biomarkers were added to this analysis, they exhibited the highest statistically significant independent prognostic utility with a p value of <0.0001. The results of the Cox Proportional Hazard analysis are summarized below in Table 44.

TABLE 44 Results of Cox Proportional Hazard Analysis with Chemo-Naïve Patient Population (SLPI, src, PSMB9, p21ras, and E2F1 Biomarker Panel) Variable P Value Hazard Ratio (95% CI) Analysis (without Biomarkers) Age at Diagnosis 0.0002 1.05 (1.02, 1.08) Tumor Size 0.0066 1.28 (1.07, 1.53) ER 0.2506 1.40 (0.79, 2.50) Total Grade 0.0674 1.39 (0.98, 1.99) Analysis (with Biomarkers)* Age at Diagnosis 0.0004 1.05 (1.02, 1.08) Tumor Size 0.0318 1.21 (1.02, 1.44) ER 0.0134 2.20 (1.18, 4.12) Total Grade 0.0845 1.37 (0.96, 1.96) TPO Marker <0.0001 1.92 (1.47, 2.50) Age at diagnosis was continuous variable and the biomarker was ordinary variable with 0 or 1, 2, 3, 4 (0 = none positive marker, 1 = one positive marker, or 2, 3, 4 positive marker).

The prognostic performance of the SLPI, src, PSMB9, p21ras, and E2F1 biomarker panel is graphically presented in the Kaplan-Meier plot of FIG. 3. The x-axis represents years from initial diagnosis, and the y axis is the percentage of disease-free survival. The corresponding graph for the general breast cancer population independent of biomarker analysis is presented in FIG. 4. These plot demonstrate the ability of this biomarker panel to risk stratify this early stage breast cancer patient population for disease recurrence and/or death due to primary disease. The risk of reoccurrence and/or death due to primary disease increases as the number of biomarkers that are overexpressed in the patient samples increases. The disease-free survival rates of the patient subgroups identified by the number of overexpressed biomarkers are statistically significant from each other with a p value of <0.001, as determined by log-rank test for comparison of 0 positive, 1 positive, 2 positive, 3 or more positive biomarker groups. A biomarker that is classified as overexpressed by the imaging analyses described herein is deemed “positive.”

Because one of the most important clinical features of a breast cancer patient's diagnosis relates to estrogen receptor (ER) status, the prognostic performance of the SLPI, src, PSMB9, p21ras, and E2F1 biomarker panel was further assessed using the Cox Proportional Hazard analysis in the ER-positive and -negative patient subgroups. Clinical management and prognosis of these two subgroups is different because ER-positive patients are candidates for tamoxifen therapy whereas ER negative patients are not. The results of the analysis are summarized below in Table 45. The data indicate that the five biomarkers of interest have prognostic utility in both the ER positive and negative breast cancer patient subgroups. Therefore, while the biomakers SLPI, src, PSMB9, p21ras, and E2F1 are indicative of prognosis independent of the patient's ER status, these biomarkers also correlate with ER status.

TABLE 45 Results of Cox Proportional Hazard Analysis with Chemo-Naïve Patient Population (SLPI, src, PSMB9, p21ras, and E2F1 Biomarker Panel in ER Positive and Negative Patient Subgroups) Variable P Value Hazard Ratio (95% CI) ER Positive Analysis without Biomarker Age at Diagnosis 0.0012 1.05 (1.02, 1.09) Tumor Size 0.0237 1.25 (1.03, 1.51) HER2 0.5732 1.18 (0.66, 2.13) Total Grade 0.0566 1.47 (0.99, 2.20) Analysis with Biomarker* Age at Diagnosis 0.0009 1.06 (1.03, 1.10) Tumor Size 0.0753 1.19 (0.98, 1.43) HER2 0.8523 1.06 (0.58, 1.93) Total Grade 0.0440 1.50 (1.01, 2.23) TPO Marker <0.0001 1.98 (1.46, 2.69) ER Analysis without Negative Biomarker Age at Diagnosis 0.0771 1.04 (1.00, 1.09) Tumor Size 0.1527 1.51 (0.86, 2.64) HER2 0.2562 0.55 (0.19, 1.55) Total Grade 0.9883 1.01 (0.45, 2.24) Analysis with Biomarker* Age at Diagnosis 0.3467 1.03 (0.97, 1.08) Tumor Size 0.1854 1.44 (0.84, 2.48) HER2 0.6577 0.78 (0.27, 2.30) Total Grade 0.7327 0.86 (0.38, 1.99) TPO Marker 0.0089 1.91 (1.18, 3.09) Age at diagnosis was continuous variable and the TPO marker was ordinary variable with 0 or 1, 2, 3, 4 (0 = none positive marker, 1 = one positive marker, or 2, 3, 4 positive marker).

Example 9 Detection of Biomarker Overexpression in a Chemo-Naïve Patient Population with 10-Year Clinical Follow-up (Six Biomarker Panel)

Breast tumor tissue samples from 100 patients (50 good outcome; 50 bad outcome patients) from the chemo-naïve patient population described in Example 8 were analyzed for biomarker overexpression of six biomarkers of interest (SLPI, src, PSMB9, p21^(ras), E2F1, and MUC-1). Detection of expression of the six biomarker panel was performed by automated immunohistochemistry essentially as described above except that an alternate staining platform, the Ventana BenchMark XT, was used in place of the Dako Autostainer. A standard manual for operating the Ventana BenchMark XT is readily available from the manufacturer. Additional modifications to the immunohistochemistry parameters used with the Ventana BenchMark XT staining platform are summarized in Table 46 below. Biomarker overexpression was determined as before using the imaging analysis described in Example 4.

TABLE 46 Immunohistochemistry Parameters for Biomarker Staining with the Ventana BenchMark XT Staining Platform Antibody Antigen Antigen Antibody Antibody Concentration Retrieval Retrieval Incubation Incubation Block and Biomarker (ug/ml) Solution Time Temp Time Amplification SLPI 3.6 CC1 Extended RT  1 hr None E2F1 2.0 CC1 Extended 37° C. 16 min Pro & Biotin Amp SRC 40 CC2 Standard 37° C.  1 hr None p21^(ras) 13.7 CC1 Short 37° C. 12 min None PSMB9 6.5 CC2 Standard RT  1 hr None MUC1 5.0 CC1 Extended 37° C.  1 hr None *CC1 and CC2 refer to cell conditioning reagents commercially available from Ventana. With respect to antigen retrieval times: short = 30 min; standard = 60 min; and extended = 90 min.*

The prognostic performance of the 6 biomarker panel was assessed utilizing a Cox Proportional Hazards Model analysis, as above. The prognostic value of each biomarker and/or histological characteristic to identify the patients who suffered disease recurrence or death within ten years over the patients disease-free after 10 years was calculated. The biomarkers of interest (SLPI, src, PSMB9, p21^(ras), E2F1, and MUC-1) exhibited statistically significant prognostic utility with a p value of 0.0220. The results of the Cox Proportional Hazard analysis are summarized below in Table 47.

TABLE 47 Results of Cox Proportional Hazard Analysis with Chemo-Naïve Patient Population (SLPI, src, PSMB9, p21ras, E2F1, and MUC-1 Biomarker Panel) 95% Hazard Ratio Hazard Confidence Variable P Value Ratio Limits Age at Diagnosis 0.0523 1.032 1.000 1.066 Tumor Size 0.0180 1.319 1.049 1.658 Her2 0.2619 0.640 0.293 1.396 ER 0.4539 1.359 0.609 3.035 Total Grade 0.7693 1.075 0.661 1.749 Biomarkers (SLPI, src, 0.0220 1.335 1.042 1.709 PSMB9, p21ras, and E2F1 Biomarker Panel)

The prognostic performance of the SLPI, src, PSMB9, p21ras, E2F1, and MUC-1 biomarker panel is graphically presented in the Kaplan-Meier plot of FIG. 5. The x-axis represents years from initial diagnosis, and the y axis is the percentage of disease-free survival. This plot demonstrates the ability of this biomarker panel to risk stratify this early stage breast cancer patient population for disease recurrence and/or death due to primary disease. The risk of reoccurrence and/or death due to primary disease increases as the number of biomarkers that are overexpressed in the patient samples increases. The disease-free survival rates of the patient subgroups identified by the number of overexpressed biomarkers are statistically significant from each other with a p value of <0.0065, as determined by log-rank test for comparison of 0 positive, 1 positive, 2 positive, 3 or more positive biomarker groups. As described above, a biomarker that is classified as overexpressed by the imaging analyses described herein is deemed “positive.”

TABLE 48 Biomarker Nucleotide and Amino Acid Sequence Information Nucleotide Sequence Amino Acid Sequence Biomarker Sequence Sequence Name Accession No. Identifier Accession No. Identifier SLPI NM_003064 SEQ ID NO: 1 NP_003055 SEQ ID NO: 2 DARPP-32 NM_032192 SEQ ID NO: 3 NP_115568 SEQ ID NO: 4 MGC14832 NM_032339 SEQ ID NO: 5 NP_115715 SEQ ID NO: 6 NDRG-1 NM_006096 SEQ ID NO: 7 NP_006087 SEQ ID NO: 8 PSMB9 NM_002800 SEQ ID NO: 9 NP_002791 SEQ ID NO: 10 p27 NM_004064 SEQ ID NO: 11 NP_004055 SEQ ID NO: 12 E2F1 NM_005225 SEQ ID NO: 13 NP_005216 SEQ ID NO: 14 MCM6 NM_005915 SEQ ID NO: 15 NP_005906 SEQ ID NO: 16 MCM2 D83987 SEQ ID NO: 17 BAA12177 SEQ ID NO: 18 MUC-1 NM_182741 SEQ ID NO: 19 NP_877418 SEQ ID NO: 20 p21ras NM_005343 SEQ ID NO: 21 NP_005334 SEQ ID NO: 22 Src NM_005417 SEQ ID NO: 23 NP_005408 SEQ ID NO: 24 TGF-beta3 BC018503 SEQ ID NO: 25 AAH18503 SEQ ID NO: 26 PDGFRalpha M21574 SEQ ID NO: 27 AAA96715 SEQ ID NO: 28 Myc V00568 SEQ ID NO: 29 CAA23831 SEQ ID NO: 30 SERHL NM_014509 SEQ ID NO: 31 NP_055324 SEQ ID NO: 32

All publications and patent applications mentioned in the specification are indicative of the level of those skilled in the art to which this invention pertains. All publications and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.

Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be obvious that certain changes and modifications may be practiced within the scope of the appended claims. 

1. A method for evaluating the prognosis of a breast cancer patient, said method comprising detecting overexpression of at least one biomarker in a sample from said patient, wherein said biomarker is selected from the group consisting of SLPI, p21ras, MUC-1, DARPP-32, phospho-p27, src, MGC 14832, myc, TGFβ-3, SERHL, E2F1, PDGFRα, NDRG-1, MCM2, PSMB9, and MCM6, wherein overexpression of said biomarker is indicative of prognosis, and thereby evaluating the prognosis of said breast cancer patient.
 2. A method for evaluating the prognosis of a breast cancer patient, said method comprising: a) obtaining a sample from said patient; b) contacting said sample with at least one antibody, wherein said antibody specifically binds to a biomarker protein, wherein said biomarker protein is selected from the group consisting of SLPI, p21ras, MUC-1, DARPP-32, phospho-p27, src, MGC 14832, myc, TGFβ-3, SERHL, E2F1, PDGFRα, NDRG-1, MCM2, PSMB9, and MCM6; c) detecting binding of said antibody to said biomarker protein; d) determining if said biomarker protein is overexpressed in said sample, wherein overexpression of said biomarker protein is indicative of a poor prognosis; and, e) thereby evaluating the prognosis of said breast cancer patient.
 3. The method of claim 2, wherein said biomarkers are selected from the group consisting of SLPI, p21ras, MUC-1, DARPP-32, phospho-p27, src, MGC 14832, myc, TGFβ-3, SERHL, E2F1, PDGFRα, NDRG-1, MCM2, PSMB9, and MCM6.
 4. A kit comprising at least two antibodies, wherein each of said antibodies specifically binds to a distinct biomarker protein that is indicative of poor prognosis of a breast cancer patient, and wherein the biomarker proteins are selected from the group consisting of SLPI, p21ras, MUC-1, DARPP-32, phospho-p27, src, MGC 14832, myc, TGFβ-3, SERHL, E2F1, PDGFRα, NDRG-1, MCM2, PSMB9, and MCM6.
 5. The kit of claim 4, wherein said biomarker proteins are selected from the group consisting of E2F1, SLPI, MUC-1, src, p21ras, and PSMB9.
 6. The kit of claim 4, wherein said kit further comprises chemicals for the detection of antibody binding to said biomarker protein.
 7. The kit of claim 4, wherein said kit is used with a commercial antibody binding detection system.
 8. The kit of claim 4, wherein said kit further comprises a positive control sample.
 9. The kit of claim 4, wherein said kit further comprises instructions for use. 